We've started using NirOptions != NULL to mean "we're using NIR for this
stage." However, when INTEL_USE_NIR=1, we set it for a bunch of stages
that still use the vec4 backend, and thus definitely aren't using NIR.
For example, if INTEL_USE_NIR=1 we disable the GLSL IR cubemap
normalization pass, even for vertex shaders and geometry shaders. This
is wrong, but breaks a very uncommon case.
When I started deleting GLSL IR for stages where we claimed to be using
NIR, this bug quickly became apparent.
For now, only set it for fragment shaders, and vertex shaders if
brw->scalar_vs is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The disassembly currently has the swizzle after the type for 3src source
operands, and the other way around for 2src. Flip the type and swizzle
around for 3src so that the output matches 2src.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
These should never happen. Plus, NIR passes really shouldn't be
reporting linker errors - this is past link time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't actually need a gl_program struct. We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX). We may as well just pass that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I want to use this in some code that doesn't currently include mtypes.h.
It seems like a better place for it anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This header was originally going to be called pipeline.h, but it got
renamed at the last minute. Make the include guards match.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Neither the shader nor the key change when doing elts or linear variant, so
this was just annoying (probably mildly useful at some point when we printed
the IR per function too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb1. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a follow-on fix from the earlier "glsl: allow ForceGLSLVersion
to override #version directives" change. Since we're not changing
the language_version field, we have to check forced_language_version
here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Skylake the order of the arguments for sample messages with the LD
type are u, v, lod, r whereas previously they were u, lod, v, r.
This fixes 144 Piglit tests including ones that directly use
texelFetch and also some using the meta stencil blit path which
appears to use texelFetch in its shader.
v2: Fix sampling 1D textures
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
On Gen7/8 for RAW surface format, the depth field (surf[3]) in surface
state means [30:21] bits of number of entries which is different from
other surface format which uses [26:21] bits field.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.
Mark emit instructions as unkillable so DCE doesn't eat them.
Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.
Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
arb_stencil_texturing-draw failed under softpipe because we got a float
back from the texturing function, and then tried to U2F it, stencil
texturing returns ints, so we should fix the tiling to retrieve
the stencil values as integers not floats.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This pass performs a mark and sweep pass over a nir_shader's associated
memory - anything still connected to the program will be kept, and any
dead memory we dropped on the floor will be freed.
The expectation is that this will be called when finished building and
optimizing the shader. However, it's also fine to call it earlier, and
many times, to free up memory earlier.
v2: (feedback from Jason Ekstrand)
- Skip sweeping impl->start_block, as it's already in the CF list.
- Don't sweep SSA defs (they're owned by their defining instruction)
- Don't steal phi sources (they're owned by nir_phi_instr).
- Don't steal tex->src (it's owned by the tex_inst itself)
- Don't sweep dereference chains (top-level dereferences are owned by
the instruction; sub-dereferences are owned by the parent deref).
- Don't sweep sources and destinations (SSA defs are handled as part of
the defining instruction, and registers are handled as part of
function implementations).
- Just steal instructions; don't walk them (no longer required).
v3: (feedback from Jason Ekstrand)
- Steal indirect sources from nir_src/nir_dest.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason pointed out that variable dereferences in NIR are really part of
their parent instruction, and should have the same lifetime.
Unlike in GLSL IR, they're not used very often - just for intrinsic
variables, call parameters & return, and indirect samplers for
texturing. Also, nir_deref_var is the top-level concept, and
nir_deref_array/nir_deref_record are child nodes.
This patch attempts to allocate nir_deref_vars out of their parent
instruction, and any sub-dereferences out of their parent deref.
It enforces these restrictions in the validator as well.
This means that freeing an instruction should free its associated
dereference chain as well. The memory sweeper pass can also happily
ignore them.
v2: Rename make_deref to evaluate_deref and make it take a nir_instr *
instead of void *. This involves adding &instr->instr everywhere.
(Requested by Jason Ekstrand.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can't allocate them out of the nir_ssa_def itself, because it may not
be ralloc'd (for example, nir_dest embeds a nir_ssa_def).
However, allocating them out of the instruction should work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Phi sources are part of the phi instruction and should have the same
lifetime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of the params array needs to be match the nir_call_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 18004c3 introduced more restrictive validation to linker
between inputs and outputs. This patch skips the additional check
for programs that utilize GL_ARB_separate_shader_objects, there
inputs and outputs might not make exact match during linking but
only when constructing the final pipeline.
This made some of the GL_ARB_program_interface_query tests shaders
fail to link, these tests can be used to verify the change.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We limit y-tiling to 0x20 when depth is involved. However the function is
run for each miplevel, and the hardware expects miplevel 0 to have the
highest tiling settings. Perform the y-tiling limit on all levels of a
3d texture, not just the ones that have depth.
Fixes:
texelFetch fs sampler3D 98x129x1-98x129x9
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Haswell hardware seems to ignore Render Stream Select bits from
3DSTATE_STREAMOUT packet when the SOL stage is disabled even if
the PRM says otherwise. Because of this, all primitives are sent
down the pipeline for rasterization, which is wrong. If SOL is
enabled, Render Stream Select is honored and primitives bound to
non-zero streams are discarded after stream output.
Since the only purpose of primives sent to non-zero streams is to
be recorded by transform feedback, we can simply discard all geometry
bound to non-zero streams then transform feedback is disabled
to prevent it from ever reaching the rasterization stage.
Notice that this patch introduces a small change in the behavior we
get when a geometry shader emits more vertices than the maximum declared:
before, a vertex that was emitted to a non-zero stream when TF was
disabled would still count for the purposes of checking that we don't
exceed the maximum number of output vertices declared by the shader. With
this change, these vertices are completely ignored and won't increase
the output vertex count, making more room for other (hopefully more
useful) vertices.
Fixes piglit test arb_gpu_shader5-emitstreamvertex_nodraw on Haswell
and Broadwell.
v2 (Ken): Drop is_haswell check in favor of doing this unconditionally.
Broadwell needs the workaround as well, and it doesn't hurt to do it in
general. Also tweak comments - the Haswell PRM does actually mention
this ("Command Reference: Instructions" page 797).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Jordan added this in commit 741782b594 for
Gen7 platforms. I missed this when adding the Broadwell code.
Fixes Piglit's spec/arb_gpu_shader5/invocation-id-{basic,in-separate-gs}
with MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5 set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
While working on NIR's memory allocation model, I realized the GLSL IR
memory model was broken.
During glCompileShader, we allocate everything out of the
_mesa_glsl_parse_state context, and reparent it to gl_shader at the end.
During glLinkProgram, we allocate everything out of a temporary context,
then reparent it to the exec_list containing the linked IR.
But during brw_link_shader - the driver's final opportunity to do
lowering and optimization - we just allocated everything out of the
permanent context given to us by the linker. That memory stayed
forever.
Notably, passes like brw_fs_channel_expressions cause us to churn the
majority of the code, so we really want to free dead IR here.
Saves 125MB of memory when replaying a Dota 2 trace on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This allows SIMD16 mode to work for a lot more programs. Texturing is
also more efficient in SIMD16 mode than SIMD8. Several messages don't
actually exist in SIMD8 mode, so we did SIMD16 messages and threw away
half of the data. Now we compute real data in both halves.
Also, the SIMD16 "sample" message doesn't require all three coordinate
components to exist (like the SIMD8 one), so we can shorten the message
lengths, cutting register usage a bit.
I chose to implement the visitor functionality in a separate function,
since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks
seemed like a mess. The new code bails on a few cases where we'd
have to do two SIMD8 messages - we just fall back to SIMD8 for now.
Improves performance in "Shadowrun: Dragonfall - Director's Cut" by
about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around
in the first mission).
v2: Add ir_txf to the has_lod case (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen5+ systems allow you to specify multiple shader programs - both SIMD8
and SIMD16 - and the hardware will automatically dispatch to the most
appropriate one, given the number of subspans to be processed.
However, that is not the case on Gen4. Instead, you program a single
shader. If you enable multiple dispatch modes (SIMD8 and SIMD16), the
shader is supposed to contain a series of jump instructions at the
beginning. The hardware will launch the shader at a small offset,
hitting one of the jumps.
We've always thought that sounds like a pain, and weren't clear how it
affected performance - is it worth having multiple shader types? So,
we never bothered with SIMD16 until now.
This patch takes a simpler approach: try and compile a SIMD16 shader.
If possible, set the no_8 flag, telling the hardware to just use the
SIMD16 variant all the time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This flag means to ignore the SIMD8 program and only use the SIMD16 one.
It was originally meant for repdata clear shaders, but I plan to use it
for other things on Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I've no idea why this was 4. It certainly seems wrong.
Prevents assertion failures in fp-incomplete-tex with some upcoming
patches of mine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The CSE algorithm will continuously allocate new ae_entry objects. As
each new basic block is exited, all of the previously allocated objects
are dumped. Instead, put them in a free list and re-use them in the
next basic block. Reduce, reuse, recycle!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
From GLSL 3.30 and GLSL ES 1.00 on, after processing the line
directive (including its new-line), the implementation should
behave as if it is compiling at the line number passed as
argument. In previous versions, it behaved as if compiling
at the passed line number + 1.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=88815
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From GLSL 1.30.10, section 3.3 (Preprocessor):
"#line line source-string-number ... After processing this directive
(including its new-line), the implementation will behave as if it is
compiling at ... source string number source-string-number. Subsequent
source strings will be numbered sequentially, until another #line
directive overrides that numbering."
In the previous implementation the source number was always zero.
Subsequent source strings are still not numbered sequentially, because
in the glShaderSource implementation we are concatenating the source code
strings into one long string.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=88815
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even if they only have one slice, otherwise textureSize() won't
produce correct results for the depth value.
Fixes 10 dEQP tests in this category:
dEQP-GLES3.functional.shaders.texture_functions.texturesize.sampler2darray*
Reviewed-by: Mark Janes <mark.a.janes at intel.com>
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.
It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for
example, xonotic or supertuxkart.
With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3. Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use the correct sprite replacement depending on the flip of the coord
mode, using either T or 1-T depending on whether we have an upper-left or
lower-left coordinate origin. This fixes all the point sprite piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Copies nouveau_buffer and radeon_buffer. This allows a write to proceed
to an uninitialized part of a buffer even when the GPU is using the
previously-initialized portions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to
keep separate timestamps around.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A resource flush is an upload of a hypothetically-staging texture to the
GPU. For a UMA system, this will largely be a no-op or
cache-maintenance. Move the render flush logic into transfer_map where
it belongs, and clear out the transfer_flush function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
pipe_sampler_view already contains a texture, remove the redundant
tex_resource member which pointed at the same thing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since NIR f/e currently encodes immediates in instructions (rather than
passing via const), we need to ensure that when const's are used the get
initialized to the proper values. Otherwise comparing NIR to TGSI
compiler, it will use proper immediate values in one case, and randomly
initialize values in the other. Which confuses ir3test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Be smarter about propagating copies from const or immed, or with abs/neg
modifiers. Also, realize that absneg.s and absneg.f are really "fancy"
mov instructions.
This opens up the possibility to remove more copies. It helps the TGSI
frontend a bit, but will be really needed for the NIR f/e which builds
everything up in SSA form (ie. will *always* insert a mov from const or
immediate).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Even though in the end, they map to the same bits, the backend will need
to be able to differentiate float abs/neg vs integer abs/neg. Rather
than making the backend figure it out based on instruction opcode (which
when combined with mov/absneg instructions, can be awkward), just split
out different flags for each so the frontend can signal it's intentions
more clearly. Also, since (neg) for bitwise op's is actually a bitwise-
not, split it out into bnot flag.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add helpers for constructing SSA forms of instructions.
Only partial cat5/cat6 coverage.. but we can add stuff as needed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to pull in libnir.la and it's dependency libglsl_util.la. Also,
_mesa_error_no_memory() must be defined.
Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't
also need libstdc++.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we want to use NIR from state trackers that don't already pull in the
whole of glsl (ie. anything other than mesa state tracker), we need a
separate more minimal libnir. Possibly NIR should be better split out
from glsl, but for now, generate a second smaller libnir.la for those
who just want NIR but not all of glsl.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).
A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:
http://people.freedesktop.org/~robclark/div-lowering.py
I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Switch between the two clip space definitions already available
in hardware. Update winding order dependent state according
to the clip control state.
This change did not introduce new piglit quick.test regressions on
an Ivybridge Mobile and a GM45 Express chipset.
Also it enables and passes the clip-control and clip-control-depth-precision
tests on these two chipsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
The _WindowMap can be dropped from gl_viewport_attrib now.
Simplify gl_viewport_attrib handling where possible.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is the only real user of _WindowMap which has the depth
buffer scaling multiplied in. Maintain the _WindowMap of the
one and only viewport inside TNLcontext.
v2:
Remove unneeded parentheses.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Instead of _WindowMap just use the translation and scale
of the viewport transform directly. Thereby avoid dividing by
_DepthMaxF again.
v2:
Change order of assignments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Instead of _WindowMap just use the translation and scale
of the viewport transform directly. Thereby avoid dividing by
_DepthMaxF again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
As of da5ec2a, we allocate instruction sources out of the instruction
itself. When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a pass to L1-normalize cube-map coordinates. Some hardware
such as i965 requires that largest cube-map coordinate is +-1. We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2 (Suggested by Eric):
- Do a vector fabs and split into components later
- Move to core NIR
Reviewed-by: Eric Anholt <eric@anholt.net>
This only impacts the ARB_fp path. We can't quite disable the GLSL-level
lowering pass, because it needs to apply before
brw_do_lower_unnormalized_offset().
total instructions in shared programs: 5667857 -> 5667847 (-0.00%)
instructions in affected programs: 1114 -> 1104 (-0.90%)
helped: 16
HURT: 6
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So far we'd only used nir_builder to build brand new programs. But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The fpclassify stuff either needs std=c99 or _XOPEN_SOURCE=600 passed
to gcc, but when using the latter the lrint family of function will be defined
too.
This is in preparation for these functions to be called from other
files.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dirty-bit checking from each brw_upload_<stage>_prog function is
split out into its a new brw_<stage>_state_dirty function.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit splits portions of the existing brw_upload_vs_prog and
brw_upload_gs_prog function into new brw_vs_populate_key and
brw_gs_populate_key functions. This follows the same style as is
already present for all other stages, (see brw_wm_populate_key, etc.).
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit fb63df2215 added 4-byte mad support, but only supported
emission for floats. Disable it for ints for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
ralloc_adopt() reparents all children from one context to another.
Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this
pseudocode:
foreach child of old_ctx:
ralloc_steal(new_ctx, child)
However, ralloc provides no way to iterate over a memory context's
children, and ralloc_adopt does this task more efficiently anyway.
One potential use of this is to implement a memory-sweeper pass: first,
steal all of a context's memory to a temporary context. Then, walk over
anything that should be kept, and ralloc_steal it back to the original
context. Finally, free the temporary context. This works when the
context is something that can't be freed (i.e. an important structure).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.
total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs: 117192 -> 116359 (-0.71%)
helped: 471
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <eric@anholt.net>
InputsRead is a 64-bit bitfield. Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.
Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32. Switch back to using < now that
the underlying problem is fixed.
Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5
This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.
v2: Use the correct variable name in the comments
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In case of using a distribution tarball (or a dirty git tree) one can
have the generated sources locally. Make configure.ac error out
otherwise, to alert that about the unmet requirement(s) of python/mako.
v2: Check only for a single file for each dependency.
Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value. Only, in NIR we want a proper
bool once again, so we compare with 0. This is a lot of pointless extra
instructions.
total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs: 1342 -> 1309 (-2.46%)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand. This is common when generating NIR from TGSI.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.
total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs: 62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs: 15494 -> 15240 (-1.64%)
v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
I'm not committing)
This will be used by the VC4 driver for doing device-independent
optimization, and hopefully eventually replacing its whole IR. It also
may be useful to other drivers for the same reason.
v2: Add all of the instructions I was relying on tgsi_lowering to remove,
and more.
v3: Rebase on SSA rework of the builder.
v4: Use the NIR ineg operation instead of doing a src modifier.
v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I
expect, again).
v6: Fix handling of multi-channel KILL_IF sources.
v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than
a vector load_const. CSE doesn't recognize that srcs out of those
channels are actually all the same.
v8: Rebase on nir_builder auto-sizing, make the scalar arguments to
non-ALU instructions actually be scalars.
v9: Add support for if/loop instructions, additional texture targets, and
untested support for indirect addressing on temps.
v10: Rebase on master, drop bad comment about control flow and just choose
the X channel, use int comparison opcodes in LIT for now, drop unused
pipe_context argument..
v11: Fix translation of LRP (previously missed because I mis-translated
back out), use nir_builder init helpers.
v12: Rebase on master, adding explicit include of mtypes.h to get
INTERP_QUALIFIER_*
v13: Rebase on variables being in lists instead of hash tables, drop use
of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder
swizzle and fmov/imov_alu helpers, drop "struct" in front of
nir_builder, use nir_builder directly as the function arg in a lot of
cases, drop redundant members of ttn_compile that are also in
nir_builder, drop some half-baked malloc failure handling.
v14: The indirect uniform src0 should be scalar, not vector (noticed as
odd by robclark, confirmed by cwabbott). Apply Ken's review to
initialize s->num_uniforms and friends, skip ttn_channel for dot
products, and use the simpler discard_if intrinsic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v13)
Acked-by: Rob Clark <robclark@freedesktop.org>
NIR uses these enums/#defines in nir_variables and associated intrinsics,
but I want to be able to use them from TGSI->NIR and NIR->TGSI.
Otherwise, we had to pull in all of mtypes.h.
This doesn't cover all of the enums we might want from a shared compiler
core (like varying slots or vert attribs), but it at least covers what I
need at the moment (system values and interp qualifiers).
v2: Move to src/glsl since util/ is really vague. Include in Makefile.am
list. Use plain bitshifts and stdint types instead of undefined
BITFIELD64_BIT.
v3: Rename to shader_enums.h. Move it into Makefile.sources.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, with
recommendation to rename)
The header was added with commit 2a135c470e3(nir: Add an ALU op builder
kind of like ir_builder.h) but did not made it into to the sources list.
Fortunately it remained unused until a recent commit faf6106c6f6(nir:
Implement a Mesa IR -> NIR translator.)
v2: Remove the bogus dependency. Tweak commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
... by folding it into CLEANFILES. Don't worry about $(LANG) as it is
essentially the first folder of $(POS). With the latter already handled.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both of which were removed with commit 69db422218b(scons: Don't build
osmesa.)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a problem when we have IR like this:
(array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i
(swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) )
where we are indexing an array with the result of an expression that
accesses the same array.
In this scenario, temps will be moved to scratch space and we will need
to add scratch reads/writes for all accesses to temps, however, the
current implementation does not consider the case where a reladdr pointer
(obtained by indexing into temps trough a expression) points to a register
that is also stored in scratch space (as in this case, where the expression
used to index temps access temps[2]), and thus, requires a scratch read
before it is accessed.
v2 (Francisco Jerez):
- Handle also recursive reladdr addressing.
- Do not memcpy dst_reg into src_reg when rewriting reladdr.
v3 (Francisco Jerez):
- Reduce complexity by moving recursive reladdr scratch access handling
to a separate recursive function.
- Do not skip demoting reladdr index registers to scratch space if the
top level GRF has already been visited.
v4 (Francisco Jerez)
- Remove redundant checks.
- Simplify code by making emit_resolve_reladdr return a register with
the original src data except for reg, reg_offset and reladdr.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This mutex is used to make sure the shared context does not change
while some shared code is looking into it.
Calling BindRenderbufferEXT BindRenderbuffer with a gles context
would not take the mutex before allocating an entry. Commit a34669b
then moved out the allocation out of bind_renderbuffer into
allocate_renderbuffer before using it for the CreateRenderBuffer
entry point. This thus also made this entry point unsafe.
The issue has been hinted by Ilia Mirkin.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
The issue has been detected by coverty.
v2:
- move the declaration of obj to the else clause (Brian Paul)
v3: Review by Brian Paul
- get rid of the obj declaration in favor of a direct reference
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
At the moment to get an EGL image to a dma-buf file descriptor,
you have to use EGL_MESA_drm_image, and then use libdrm to
convert this to a file descriptor.
This extension just provides an API modelled on EGL_MESA_drm_image,
to return a dma-buf file descriptor.
v2: update spec for new API proposal
add internal queries to get the fourcc back from intel driver.
v2.1: add gallium pieces.
v2.2: add offsets to spec and API, rename fd->fds, stride->strides
in API. rewrite spec a bit more, add some q/a
v2.3:
add modifiers to query interface and 64-bit type for that (Daniel Stone)
specifiy what happens to num fds vs num planes differences. (Chad Versace)
v2.4:
fix grammar (Daniel Stone)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now, we only use brw->NewGLState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/brw->state\.dirty\.mesa/brw->NewGLState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When clearing the state for a pipeline, we will save changed state for
the other pipelines.
v3:
* Adjust brw_upload_pipeline_state
* Don't pull pipeline state bits into common state bits
* Don't clear pipeline state bits
* Adjust 'clear' phase
* brw_clear_dirty_bits is now brw_render_state_finished
* Move cross-pipeline state flagging to brw_pipeline_state_finished
* Move pipeline clears to brw_pipeline_state_finished
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw->num_atoms is converted to an array, but currently just an array
of length 1.
Adds brw_copy_pipeline_atoms which copies the atoms for a pipeline,
and sets brw->num_atoms[p] for pipeline p.
v2:
* Rename brw->atoms[] to render_atoms
* Rename brw_add_pipeline_atoms to brw_copy_pipeline_atoms
* Rename brw_pipeline_first_atom to brw_get_pipeline_atoms
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've seen some cases where performance can hurt quite a bit.
Technically, the more simple the function the more overhead there is
for using a function for this (and the less benefits this provides).
Hence don't do this if we expect the generated code to be simple.
There's an even more important reason why this hurts performance,
which is shaders reusing the same unit with some of the same inputs,
as llvm cannot figure out the calculations are the same if they
are performned in the function (even just reusing the same unit without
any input being the same provides such optimization opportunities though
not very much). This is something which would need to be handled by IPO
passes however.
mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of
mul x, -y.
With NIR:
total instructions in shared programs: 6167779 -> 6161193 (-0.11%)
instructions in affected programs: 983511 -> 976925 (-0.67%)
helped: 4106
HURT: 16
GAINED: 18
LOST: 7
Without NIR:
total instructions in shared programs: 6192323 -> 6185299 (-0.11%)
instructions in affected programs: 987875 -> 980851 (-0.71%)
helped: 4146
HURT: 16
GAINED: 16
LOST: 0
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.
On HSW (with vec4 vertex shaders):
instructions in affected programs: 4420 -> 3194 (-27.74%)
On BDW (with scalar vertex shaders):
instructions in affected programs: 12756 -> 6726 (-47.27%)
Implementing a general matrix chain ordering is harder (or at least
tedious) because of having to walk the GLSL IR to create a list of
multiplicands. I'm guessing that this patch handles 90+% of cases, but
of course to tell definitively you'd have to implement the general
thing.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When nvc0_push_vbo calls nouveau_scratch_done it does not mean
scratch buffers can be freed immediately. It means "when hardware
advances to this place in the command stream the scratch buffers
can be freed".
To fix it, just postpone scratch runout destruction after current
fence is signalled.
The bug existed for a very long time. Nobody noticed, because
"scratch runout" code path is rarely executed.
Fixes hang at the very beginning of first mission in "Serious Sam 3"
on nve7/gk107. It manifested as:
nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]]
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit cf67ca9ffa made the layouting code pick a special layout for
1D images on Skylake. This should not be used for depth and stencil
buffers because these need to be treated as 2D tiled images. However
the patch was missing a check for images with a base format of
GL_STENCIL_INDEX. In practice I don't think it's currently possible to
hit this because Mesa doesn't support GL_ARB_texture_stencil8 and it's
not possible to create a 1D renderbuffer, but it'll be good to be
ready for when the extension is supported.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will help encoding VUI into the bitstream
v2: make backward compatible
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The framerate will be used for video usability info support by VCE driver
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Just announce support for 4 components.
While here also increase the max/min texel offsets (the limit is completely
artificial, was chosen because that's what other hardware did, however there's
other drivers using larger limits).
Over a thousand little piglits skip->pass.
v2: update docs/GL3.txt
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is quite trivial, essentially just follow all the same code you'd
use with linear min/mag (and no mip) filter, then just skip the filtering
after looking up the texels in favor of direct assignment of the right channel
to the result. (This is though not true for the multi-offset version if we'd
want to support it - for this would probably need to do something along the
lines of 4x nearest sampling due to the necessity of doing coord wrapping
individually per texel.)
Supports multi-channel formats.
From the SM5 gather cap bit, should support non-constant offsets, plus shadow
comparisons (the former untested), but not component selection (should be
easy to implement but all this stuff is not really exposable anyway for now).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This cleans up more instructions generated by uniform array indexing
multiplies.
total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs: 896 -> 868 (-3.12%)
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).
I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.
total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs: 346 -> 344 (-0.58%)
Previously, the ctx->Const.ForceGLSLVersion setting only worked if
the shader lacked a #version directive. Now, the ForceGLSLVersion
setting will override the #version directive too.
This change should be safe since it should be rare to have an app
that has a mix of shader versions and we only wanted to override
the #version for shaders which lacked the #version directive.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The hardware just uses the low 24 lines, saving us an AND to drop the high
bits.
total uniforms in shared programs: 13433 -> 13423 (-0.07%)
uniforms in affected programs: 356 -> 346 (-2.81%)
total instructions in shared programs: 40003 -> 39989 (-0.03%)
instructions in affected programs: 910 -> 896 (-1.54%)
GLSL ES 3.00 spec, 4.3.10 (Linking of Vertex Outputs and Fragment Inputs),
page 45 says the following:
"The type of vertex outputs and fragment input with the same name must match,
otherwise the link command will fail. The precision does not need to match.
Only those fragment inputs statically used (i.e. read) in the fragment shader
must be declared as outputs in the vertex shader; declaring superfluous vertex
shader outputs is permissible."
[...]
"The term static use means that after preprocessing the shader includes at
least one statement that accesses the input or output, even if that statement
is never actually executed."
And it includes a table with all the possibilities.
Similar table or content is present in other GLSL specs: GLSL 4.40, GLSL 1.50,
etc but for more stages (vertex and geometry shaders, etc).
This patch detects that case and returns a link error. It fixes the following
dEQP test:
dEQP-GLES3.functional.shaders.linkage.varying.rules.illegal_usage_1
However, it adds a new regression in piglit because the test hasn't a
vertex shader and it checks the link status.
bin/glslparsertest \
tests/spec/glsl-1.50/compiler/gs-also-uses-smooth-flat-noperspective.geom pass \
1.50 --check-link
This piglit test is wrong according to the spec wording above, so if this patch
is merged it should be updated.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Correct error with commit 151fb1e where assert was renamed
to unreachable without removing ! from string argument.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This does not (yet) support different coordinate origins, so the tests
still fail due to fbo flipping.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This appears to need the A2XX version of the point list, so select it at
draw time if necessary.
Experimentally, always using the A2XX version causes hangs when PSIZE
isn't actually emitted.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The division is probably a holdover from the days when the fixed point
inline functions generated by headergen were broken.
Also reduce the maximum point size to 4092 (vs 4096), which is what the
blob does.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The SZ2 field contains the layer size of a lower miplevel. It only
contains 4 bits, which limits the maximum layer size it can describe. In
situations where the next miplevel would be too big, the hardware
appears to keep minifying the size until it hits one of that size.
Unfortunately the hardware's ideas about sizes can differ from
freedreno's which can still lead to issues. Minimize those by stopping
to minify as soon as possible.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.
v2: Do it right (thanks Ilia).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Everything is already in place; we simply have to take the scalar code
generation path. This gives us SIMD8 VS programs, instead of SIMD4x2.
v2: Rebase on the patch that drops brw->gen >= 8.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
I need to use this in brw_vec4.cpp, so it can't be static anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use prog_to_nir where we would normally call glsl_to_nir, handle program
parameter lists, and skip a few things that don't exist.
Using NIR generates much better shader code than Mesa IR, since we get
real optimizations, as opposed to prog_optimize:
total instructions in shared programs: 314007 -> 279892 (-10.86%)
instructions in affected programs: 285173 -> 251058 (-11.96%)
helped: 2001
HURT: 67
GAINED: 4
LOST: 7
v2: Change early return in nir_setup_uniforms to if/else (Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Shamelessly ripped off from Eric Anholt's tgsi_to_nir pass.
This is not built on SCons, like the rest of NIR.
v2:
- Delete redundant c->s, c->impl, and c->cf_node_list pointers (Ken)
- Use nir_builder directly instead of ptn_compile in more places (Ken)
- Drop 'struct' keyword in front of nir_builder (ken)
- Add a file level Doxygen comment (Ken)
- Use scalar constants instead of splatting (Eric)
- Use nir_builder helpers for constants, moves, and swizzles (Connor)
v3: Minor indentation improvements.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These will be useful for prog->nir and tgsi->nir.
v2: Don't forget to mark nir_swizzle as inline (Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The PMA depth stall must be enabled (optimization turned off) under certain
circumstances on gen8. This was supposedly fixed for Gen9, which means we do not
need to check, or toggle the state. The hardware is supposed to enable the
hardware optimization by default, unlike BDW, so we also don't need to set it at
init. For whatever reason this improves stability on ETQW with the bug mentioned
below.
References: https://bugs.freedesktop.org/show_bug.cgi?id=89039 (doesn't fix)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Recomendation [sic] is to set this field to 1 always. Programming it to default
value of 0, may have -ve impact on performance for MSAA WLs.
Another don't suck bit which needs to get set.
The patch wasn't as well tested as I would have liked, primarily I don't have
perf numbers for it, but it's getting to a point where it is in danger of being
lost.
v2: v1 was a mix of two patches. Since 0x7004 is masked, we only need to set it
once at initialization and make sure the pma workaround doesn't set the mask bit
(which it doesn't).
Move LRI to init gpu state (Ken)
Add a comment.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These functions looked quite complicated, even though what they actually did
was trivial (ever since we dropped swizzled rendering). Also drop lookup of
format block per bytes done for each block, and do it once per scene instead.
This improves everybody's favorite "benchmark" by 3% or so, though
lp_rast_shade_quads_all() which calls this shows up still quite high for a
function which does little more than call the jit function.
(This would most likely be much better handled by the jit function itself,
the strides are passed through anyway already, though for being able to
handle layers it would definitely add some complexity.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When using the texel fetch functions rather than ordinary texturing,
the arguments are all int vecs instead of float vecs, not to mention
the actual function would look completely different. Hence this must
be included in the texture function name (which serves as the key)
otherwise things crash badly when a shader accesses the same texture
and sampler unit with both txf/ld and ordinary texturing instructions
with otherwise matching keys.
There are issues with inlining everything, most notably llvm will use much
more memory (and be slower) when compiling. Ideally we'd probably use
functions for shader functions too but texture sampling usually is responsible
for quite some IR (it can easily reach 80% of total IR instructions) so this
seems like a good start.
This still generates a different function for all different combinations just
like before, however it is possible llvm is missing some optimization
opportunities - it is believed though such opportunities should be somewhat
rare, but at least for now it can still be switched off (at compile time only).
It should probably make compiled code also smaller because the same function
should be used for different variants in the same module (so for the
opaque/partial or linear/elts variants).
No piglit change (though it does indeed speed up unrealistic tests like
fp-indirections2 by a factor of 30 or so).
Has a small negative performance impact in openarena - I suspect this could
be fixed by running some IPO passes (despite the private linkage, llvm right
now does NO optimization at all wrt anything going past the call, even if
there's just one caller - so things like values stored before the call and then
always written by the function etc. will not be optimized away, nor will dead
arguments (which we mostly shouldn't have) be eliminated, always constant
arguments promoted etc.).
v2: use proper return values instead of pointer function arguments.
llvm supports aggregate return values, which do wonders here eliminating
unnecessary stack variables - everything in fact will be returned in registers
even without any IPO optimizations. It makes the code simpler too.
With this I could not measure a peformance impact in openarena any longer
(though since there's still no constant value propagation etc. into the tex
functions this does not mean it couldn't have a negative impact elsewhere).
v3: fix some minor issues suggested by Jose, and do disassembly (and the
profiling) without hacks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The data in memory is in big endian format and needs to be converted
into CPU byte order. So the patch actually reversed what needs to be done.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The CUBE_ARRAY case uses r[4]. Make sure that the stack variable is
there.
Noticed by Coverity.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Does not appear to be used in tree. Coverity spotted some errors in the
bitmask stuff, but the whole thing appears to be unused.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Our fragment program backend implements support for TXP directly, and
there's no NIR lowering pass to remove the projection. When we switch
fragment program support over to NIR, we need to support it somehow.
It's easy enough to support directly.
v2: Split out offset/tex_offset rename (requested by Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
fs_visitor::nir_emit_texture() created an fs_reg variable called offset,
which shadowed the offset() helper function in brw_ir_fs.h.
Rename the variable to tex_offset so we can still call offset().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Don't be lazy. Constify the as_foo functions and use those instead
of ugly casts. Suggested by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Now that they're all implemented using macros, this is trivial.
v2: Remove redundant parenthesis. Suggested by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The downcast functions for non-leaf classes were previously implemented
"by hand." Now they are implemented using macros based on the is_foo
functions added in the previous patch.
v2: Remove redundant parenthesis. Suggested by Curro (on the next
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These functions deteremine when an IR node is one of the non-leaf
classes.
v2: Adjust indentation to line up. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These are due how we implemented the atomic tests, not the atomic
implementation itself. It's also difficult to refactor the code to
avoid the warnings due to the use of macros -- the code would be quite
hairy.
Reviewed-by: Brian Paul <brianp@vmware.com>
Somehow, merely including any of the *intrin.h headers causes dozens of
this warnings (when compiling pretty much every source file). MSVC does
not always complain the same -- so it's possible we're doing something
weird --, but silence these warnings in the meanwhile.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC's implementation of signbit(x) uses sizeof(x) with expressions to
dispatch to an internal function based on the argument's type (float,
double, etc), but that raises a flag with MSVC's own static analyzer,
and because this is an inline function in a header it causes substantial
warning spam.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC 2013 declares these functions, both for C and C++ source files.
This was caught with MSVC in analyze mode.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't seem much interest on osmesa on Windows, particularly classic osmesa.
If there is indeed interest in osmesa on Windows, we should instead
integrate src/gallium/targets/osmesa into SCons.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Review from Laura Ekstrand
- get rid of a change that should not have happened in this patch
- improve the error messages
- fix alignments
- fix a capitalization in a function name in an error message
v3: Review from Laura Ekstrand
- move the test for the validity of the renderbuffer to less generic
functions
- get rid of some changes that accidentally landed in the wrong commit
- revert some alignment fixes
v3: Review from Laura Ekstrand
- check that the lookup returns a valid renderbuffer
- cosmetic changes to some error messages
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2:
- improve an error message
v3:
- move a test to less generic functions
- fix an alignment
v4:
- take the caller as a parameter instead of bool dsa
- check that the lookup returns a valid renderbuffer
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Because of the current way the code is architectured, there is no
functional difference between the DSA and the non-DSA path.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
These entry points will be fleshed out when the GL_ARB_query_buffer_object
extension gets implemented. In the meantime, return GL_INVALID_OPERATION as
suggested by Ian.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: Review from Laura Ekstrand
- use the transform feedback object lookup wrapper
v3:
- use the new name of _mesa_lookup_transform_feedback_object_err
v4: Review from Laura Ekstrand
- fix some alignement problems
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: Review from Laura Ekstrand
- use the transform feedback object lookup wrapper
v3:
- use the new name of _mesa_lookup_transform_feedback_object_err
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: review from Laura Ekstrand
- use the refactored code to lookup the objects
- improve some error messages
- factor out the gl method name computation
- better handle the spec differences between the DSA and non-DSA cases
- quote the spec a little more
v3: review from Laura Ekstrand
- use the new name of _mesa_lookup_bufferobj_err
- swap the comments around the offset and size checks
v4: review from Laura Ekstrand
- add more spec quotes
- properly fix the comments around the offset and size checks
v5: review from Laura Ekstrand
- add quotes on the spec citations
- revert some changes in the printf format
v6: review from Laura Ekstrand
- remove a redondant "gl" in a method name
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
v2: Review from Laura Ekstrand
- give more helpful error messages
- factor the lookup code for the xfb and objBuf
- replace some already-existing tabs with spaces
- add comments to explain the cases where xfb == 0 or buffer == 0
- fix the condition for binding the transform buffer or not
v3: Review from Laura Ekstrand
- rename _mesa_lookup_bufferobj_err to
_mesa_lookup_transform_feedback_bufferobj_err and make it static to avoid a
future conflict
- make _mesa_lookup_transform_feedback_object_err static
v4: Review from Laura Ekstrand
- add the pdf page number when quoting the spec
- rename some of the symbols to follow the public/private conventions
v5: Review from Laura Ekstrand
- properly rename some of the symbols to follow the public/private conventions
- fix some alignments
- add quotes around a spec citation
- add back a newline I accidentally deleted
- add spaces around the ternary operator usages
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
v2: Review from Laura Ekstrand
- generate the name of the gl method once
- shorten some lines to stay in the 78 chars limit
v3: Review from Fredrik Höglund <fredrik@kde.org>
- rename gl_mthd_name to func
- set EverBound in _mesa_create_transform_feedbacks in the dsa case
v4:
- rename _mesa_create_transform_feedbacks to create_transform_feedbacks and
make it static
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Maybe this should be the job of the dispatch layer.
v2:
- add the section name and pdf page number of the quote (Laura)
- OpenGL 3.0 core does not exist, get rid of "core"
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
We were building libglsl_util.la without our visibility flags and
leaking hash_table_* symbols.
Signed-off-by: Kristian Høgsberg <kristian.h.kristensen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Transform this into b2f(and(a, b)).
total instructions in shared programs: 6190291 -> 6189225 (-0.02%)
instructions in affected programs: 267247 -> 266181 (-0.40%)
helped: 866
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make use of the builtin ffs macros and split out ffsll
to a seperate block. Needed for at least OpenBSD which
does not have ffsll in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
No longer used as of commit 48c7461d5a0(st/gbm: remove state-tracker)
v2: Add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
in different fragment shaders. This also applies to a case when gl_FragCoord
is redeclared with no layout qualifiers in one fragment shader and not
declared but used in other fragment shader.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Khronos Bug#12957
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
16 / cpp happens to be the same as utile_w on the only raster format
supported (4 bytes per pixel), but simulator/hw source code generally
talks in terms of utiles.
I'd like to compile as much of the device-specific code as possible
when building for simulator, and using if (using_simulator) instead of
ifdefs helps.
This reverts commit 20346808cf.
The conversion is actually done since these are the *B macro variants
and no vtx format is supplied, which makes them go through the translate
module.
This restores the following piglit tests to passing:
draw-vertices user
gl-2.0-vertexattribpointer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
glXGetProcAddress("glFoo") ends up in stub_add_dynamic() to
create dynamic stubs for dynamic functions. stub_add_dynamic()
doesn't store the caller provided name string "Foo" in a mesa
private copy, but just stores a pointer to the "glFoo" string
passed to glXGetProcAddress - a pointer into arbitrary memory
outside mesa's control.
If the caller passes some dynamically allocated/changing
memory buffer to glXGetProcAddress(), or the caller gets unmapped
from memory, e.g., some dynamically loaded application
plugin which uses OpenGL, this ends badly - with a dangling
pointer.
strdup() the name string provided by the client to avoid
this problem.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The storage size for local kernel args can be queried before the
arguments are set by using the CL_KERNEL_LOCAL_MEM_SIZE param
of clGetKernelWorkGroupInfo().
The spec says that if local kernel arguments have not been specified,
then we should assume their size is 0.
v2:
- Implement using c++11 member initialization.
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
The pipe's get_vendor method returns something more akin to a driver
vendor string in most cases, instead of the actual device vendor. Use
get_device_vendor instead, which was introduced specifically for this
purpose.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The only hackish ones are llvmpipe and softpipe, which currently return
the same string as for get_vendor(), while ideally they should return
the CPU vendor.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This will be needed by Clover to return the correct information
to CL_DEVICE_VENDOR info queries.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
These will be especially useful when we start keeping track of
liveness information for each subregister.
Reviewed-by: Matt Turner <mattst88@gmail.com>
And set it in the MOV instructions that copy the temporary to the
original destination if the generator instruction had it set.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix typo and punctuation in a comment, break long line and add space
before curly bracket.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
try_copy_propagate() was checking the bit of the saturate mask for the
arg-th component of the source to decide whether the whole source
should be saturated (WTF?). We need to swizzle the original saturate
mask and check that for all enabled channels the saturate flag is
either set or unset, as we cannot saturate a subset of destination
components only.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This reverts commit 0dfec59a27. The
change prevented propagation of copies with the saturate flag set,
making the whole saturate mask tracking completely useless. A proper
fix follows.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This simplifies the src_reg/dst_reg conversion constructors using the
swizzle utils introduced in a previous patch. It also makes them more
useful by changing their semantics slightly: dst_reg(src_reg) used to
set the writemask to XYZW if the src_reg swizzle was anything other
than XXXX, which was almost certainly not what the caller intended if
the swizzle was non-trivial. After this patch the same components
that are present in the swizzle will be enabled in the resulting
writemask.
src_reg(dst_reg) used to set the first components of the swizzle to
the enabled components of the writemask and then replicate the last
enabled component to fill the swizzle, which, in cases where the
writemask didn't have exactly the first n components set, would in
general not be compatible with the original dst_reg. E.g.:
| ADD(tmp, src_reg(tmp), src_reg(1));
would *not* do what one would expect (add one to each of the enabled
components of tmp) if tmp didn't have a writemask of the described
form (e.g. YZ, YW, XZW would all fail). This pattern actually occurs
in many different places in the VEC4 back-end, it's a wonder that it
hasn't caused piglit failures until now. After this patch
src_reg(dst_reg) will construct a swizzle with each enabled component
at its natural position (e.g. Y at the second position, Z at the
third, and so on). The resulting swizzle will behave like the
identity when used in any instruction with the original writemask.
I've manually verified that *none* of the callers of both conversion
constructors were relying on the previous broken semantics. There are
no piglit regressions on any generation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It could be objected that swizzle_for_size() is "faster" than
brw_swizzle_for_size(). It's not measurably better in any reasonable
CPU-bound benchmark on VLV according to the Finnish benchmarking
system (including the SynMark2 DrvShComp shader compilation
benchmark).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This seemed to be trying to deduce the number of uniform vector
components from the parameter swizzle, but the algorithm would always
give 4 as result. Instead grab the correct number of components from
the GLSL type.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This defines helper functions implementing some common swizzle
transformations that are usually open-coded in the compiler back-end,
causing a lot of clutter. Some optimization passes will become almost
trivial implemented in terms of these functions (e.g.
vec4_visitor::opt_reduce_swizzle()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
FS instructions with NIR on i965:
total instructions in shared programs: 2663561 -> 2619051 (-1.67%)
instructions in affected programs: 1612965 -> 1568455 (-2.76%)
helped: 5455
HURT: 12
FS instructions with NIR on g4x:
total instructions in shared programs: 2352633 -> 2307908 (-1.90%)
instructions in affected programs: 1441842 -> 1397117 (-3.10%)
helped: 5463
HURT: 11
FS instructions with NIR on ilk:
total instructions in shared programs: 3997305 -> 3934278 (-1.58%)
instructions in affected programs: 2189409 -> 2126382 (-2.88%)
helped: 8969
HURT: 22
FS instructions with NIR on hsw (snb and ivb were similar):
total instructions in shared programs: 4109389 -> 4109242 (-0.00%)
instructions in affected programs: 109869 -> 109722 (-0.13%)
helped: 339
HURT: 190
No SIMD16 programs were gained or lost on any platform
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Fix the spelling of analyze and re-arrange code for better readability
as per Connor's comments.
v3: Make the naming of things more consistent and add a pile of comments
v4: Stop trying to avoid vectors
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Most cases seem harmless, though that might not always be the case. Maybe
one day we can get gcc to complain about these and fix them throughout
the code, but until then let's silence them.
Reviewed-by: Brian Paul <brianp@vmware.com>
It's a bit hackish couldn't find another solution. See code comment
for details. The warning is useful, so universally disabling doesn't
sound a good idea.
Fixes
warning C4005: 'xxx' : macro redefinition
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids MSVC the warning
warning C4013: 'isatty' undefined; assuming extern returning int
with certain versions of flex.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add win flex-bison link to docs/install.html.
This prevents the MSVC from
warning C4090: 'function' : different 'const' qualifiers
when compiling flex generated lexers.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC defaults to no exceptions unless /EH option is passed (which we don't), while
MSVC's STL defaults to use exceptions unless _HAS_EXCEPTIONS=0 is defined,
which we didn't.
This fixes
warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Reviewed-by: Brian Paul <brianp@vmware.com>
This addresses
...\glsl_parser.cpp(...) : warning C4065: switch statement contains 'default' but no 'case' labels
This is on code generated by bison, which we have little control.
It seems useful to have this warning otherwise enabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that GLboolean is an alias for unsigned char, which lacks the
implicit true/false semantics that C++/C99 bool have.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Change gl_shader::IsES and gl_shader_program::IsES to be bool as
recommended by Ian Romanick.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should have been part of 429a4355259(galahad: remove driver). Seems like
I've erroneously committed the trimmed patch.
Reported-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Both seems to be excessively long, namely:
ClientAPIString can get up-to 47 based on current code, while the name
of the driver can dictate the length of the VersionString, currently it
is around 11. Let's pad each to 100, rather than the current 1000.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There's 2 reasons why we'd want to use the global context:
1) There still seems to be one memory "leak" left when using multiple llvm
contexts (it is not a true leak as the memory disappears into some still
addressable pool but nevertheless the memory consumption grows). See
http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/
2) These contexts get kinda big - even when disposing modules etc. after
compiling a shader the LLVMContext can easily be over 100kB. So when there's
lots of llvm contexts arounds it adds up.
The downside is that at least right now this is absolutely not thread safe,
so this only works safely in environments where multiple pipe contexts are not
used concurrently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes a piglit regression
(shaders/glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined) with
my series for GVN.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This is currently not a problem because the vec4 visitor happens to
mask out unused components from the destination, but it might become
an issue when we start using atomics without writeback message. In
any case it seems sensible to set it again here because the
consequences of setting the wrong writemask (random graphics memory
corruption) are difficult to debug and can easily go unnoticed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
And calculate the message response size based on the number of
components rather than the other way around. This simplifies their
interface somewhat and allows the caller to request a writeback
message with more than one vector component in SIMD4x2 mode.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This was telling the sampler to do texture fetches for *all* channels
in the non-constant surface index case, what could have reduced
throughput unnecessarily when some of the channels were disabled by
control flow.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is going to be useful because the Gen7+ uniform and varying pull
constant, texturing, typed and untyped surface read, write, and atomic
generation code on the vec4 and fs back-end all require the same logic
to handle conditionally indirect surface indices. In pseudocode:
| if (surface.file == BRW_IMMEDIATE_VALUE) {
| inst = brw_SEND(p, dst, payload);
| set_descriptor_control_bits(inst, surface, ...);
| } else {
| inst = brw_OR(p, addr, surface, 0);
| set_descriptor_control_bits(inst, ...);
| inst = brw_SEND(p, dst, payload);
| set_indirect_send_descriptor(inst, addr);
| }
This patch abstracts out this frequently recurring pattern so we can
now write:
| inst = brw_send_indirect_message(p, sfid, dst, payload, surface)
| set_descriptor_control_bits(inst, ...);
without worrying about handling the immediate and indirect surface
index cases explicitly.
v2: Rebase. Improve documentatation and commit message. (Topi)
Preserve UW destination type cargo-cult. (Topi, Ken, Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Both do_vs_prog and do_gs_prog initialize brw_stage_prog_data::nr_params to
the number of uniform *vectors* required by the shader rather than the number
of uniform components, contradicting the comment. This is inconsistent with
what the state upload code and scalar path expect but it happens to work until
Gen8 because vec4_visitor interprets it as a number of vectors on construction
and later on overwrites its original value with the number of uniform
components referenced by the shader.
Also there's no need to add the number of samplers, they're not actually
passed in as uniforms.
Fixes a memory corruption issue on BDW with SIMD8 VS.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Several steppings of Skylake fail when using SIMD16 with 3-source
instructions (such as MAD).
This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
tests.
Based on a patch by Neil Roberts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The places that were checking whether 3-source instructions are
supported have now been combined into a small helper function. This
will be used in the next patch to add an additonal restriction.
Based on a patch by Kenneth Graunke.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brwContextInit now queries the GPU revision number via a new parameter
for DRM_I915_GETPARAM. This new parameter requires a kernel patch and
a patch to libdrm. If the kernel doesn't support it then it will
continue but set the revision number to -1. The intention is to use
this to implement workarounds that are only needed on certain
steppings of the GPU.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Generate GL_INVALID_OPERATION and return NULL when the buffer object
hasn't been created. All callers expect this.
v2: Use a more concise error message.
Cc: Laura Ekstrand <laura@jlekstrand.net>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
This add primitive restart support to the prim conversion.
This involves changing the API for the translate functions
as we need to pass the prim restart index and the original
number of indices into the translate functions.
primitive restart is support for quads, quad strips
and polygons.
This deal with the case where the actual number of output
primitives is less than the initially calculated number,
by filling the rest of the output primitives with the restart
index, the other option is to reduce the output prim number,
but that will make the generator code a bit messier.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should improve the performance of any shaders using the KIL
instruction. I'm a bit surprised we missed this.
Unfortunately, I have not been able to measure any performance
improvements from this patch. It does make ARB_fragment_program
behave similarly to GLSL code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So it turns out that this doesn't actually fix any bugs or add any features,
stictly speaking. However, it does avoid a lot of kludginess. Previously, if
you called
glCopyTextureSubImage3D(texcube, 0, 0, 0, zoffset = 3, ...
it would grab the texture image object for face = 0 in teximage.c instead of
the desired face = 3. But Line 274 of brw_blorp_blit.cpp would correct for
this by updating the slice to 3.
This commit does the correct thing before calling any drivers,
which should make the functionality much more robust and uniform across all
drivers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We use the idiom
ir_foo *x = y->as_foo();
if (x == NULL)
return;
all over the place. GCC generates some quite lovely code for this.
One such example:
340a5b: 83 7d 18 04 cmpl $0x4,0x18(%rbp)
340a5f: 0f 85 06 04 00 00 jne 340e6b
340a65: 48 85 ed test %rbp,%rbp
340a68: 0f 84 fd 03 00 00 je 340e6b
This case used as_expression() (ir_type_expression is 4). Note that it
checks the ir_type, then checks that the pointer isn't NULL. There is
some disconnect in GCC around the condition in the as_foo functions.
return ir_type == ir_type_##TYPE ? (ir_##TYPE *) this : NULL; \
It believes "this" could be NULL, so it emits check outside the function
just for fun.
This patch uses assume() to tell GCC that it need not bother with extra
NULL checking of the pointer returned by the as_foo functions.
text data bss dec hex filename
4836430 158688 26248 5021366 4c9eb6 i965_dri-before.so
4836173 158688 26248 5021109 4c9db5 i965_dri-after.so
v2: Replace 'if (this == NULL) unreachable("this cannot be NULL")' with
assume(this != NULL). Suggested by Ilia Mirkin.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we put all the uniforms into one big array. The problem with
this approach is that, as soon as there was one indirect array acces, the
backend would decide that the entire large array should be pull constants.
This commit splits the array in half: first direct-only uniforms and then
potentially-indirect uniforms. This may not be optimal, but it does let
the backend promote things to push constants.
Shader-db results on HSW:
total instructions in shared programs: 4114840 -> 4112172 (-0.06%)
instructions in affected programs: 43316 -> 40648 (-6.16%)
helped: 116
HURT: 0
v2: Set param_size[num_direct_uniforms] only if we have indirect uniforms.
This caused a bug that, strangely enough, only showed up on Broadwell
vertex shaders.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2: Delete the set of indirectly accessed variables when we're done with it
v3: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we just assigned variable locations in nir_lower_io. Now, we
force the user to assign variable locations for us. This gives the backend
a bit more control over where variables are placed.
v2: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way. I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker. In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc. This was causing massive rendering isues in
the unreal4 Sun Temple demo.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Different errors for type mismatches, size mismatches and matrix/
non-matrix mismatches. Use a common format of "uniformName"@location
in the messags.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
On platforms that do not natively generate 0u and ~0u for Boolean
results, b2f expressions that look like
f = b2f(expr cmp 0)
will generate better code by pretending the expression is
f = ir_triop_sel(0.0, 1.0, expr cmp 0)
This is because the last instruction of "expr" can generate the
condition code for the "cmp 0". This avoids having to do the "-(b & 1)"
trick to generate 0u or ~0u for the Boolean result. This means code like
mov(16) g16<1>F 1F
mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F
(+f0) sel(16) m6<1>F g16<8,8,1>F 0F
will be generated instead of
mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F
cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F
and(16) g4<1>D g2<8,8,1>D 1D
and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD
v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
that the true (or false) case already results in zero would allow better
code generation by possibly avoiding a load-immediate instruction.
v3: Apply the optimization even when neither comparitor is zero.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
instructions in affected programs: 33269 -> 33096 (-0.52%)
helped: 121
Iron Lake (0x0046):
total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
instructions in affected programs: 34199 -> 34018 (-0.53%)
helped: 129
No change on other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
The SSE 4.1 ROUND instructions let us implement roundeven directly.
Otherwise we assume that the rounding mode has not been modified (as we
do in the rest of Mesa) and use rint().
glibc uses the ROUND instruction in rint() after a cpuid check. This
patch just lets us inline it directly when we're already building for
SSE 4.1.
Reviewed-by: Carl Worth <cworth@cworth.org>
Eric's initial patch adding constant expression evaluation for
ir_unop_round_even used nearbyint. The open-coded _mesa_round_to_even
implementation came about without much explanation after a reviewer
asked whether nearbyint depended on the application not modifying the
rounding mode. Of course (as Eric commented) we rely on the application
not changing the rounding mode from its default (round-to-nearest) in
many other places, including the IROUND function used by
_mesa_round_to_even!
Worse, IROUND() is implemented using the trunc(x + 0.5) trick which
fails for x = nextafterf(0.5, 0.0).
Still worse, _mesa_round_to_even unexpectedly returns an int. I suspect
that could cause problems when rounding large integral values not
representable as an int in ir_constant_expression.cpp's
ir_unop_round_even evaluation. Its use of _mesa_round_to_even is clearly
broken for doubles (as noted during review).
The constant expression evaluation code for the packing built-in
functions also mistakenly assumed that _mesa_round_to_even returned a
float, as can be seen by the cast through a signed integer type to an
unsigned (since negative float -> unsigned conversions are undefined).
rint() and nearbyint() implement the round-half-to-even behavior we want
when the rounding mode is set to the default round-to-nearest. The only
difference between them is that nearbyint() raises the inexact
exception.
This patch implements _mesa_roundeven{f,}, a function similar to the
roundeven function added by a yet unimplemented technical specification
(ISO/IEC TS 18661-1:2014), with a small difference in behavior -- we
don't bother raising the inexact exception, which I don't think we care
about anyway.
At least recent Intel CPUs can quickly change a subset of the bits in
the x87 floating-point control register, but the exception mask bits are
not included. rint() does not need to change these bits, but nearbyint()
does (twice: save old, set new, and restore old) in order to raise the
inexact exception, which would incur some penalty.
Reviewed-by: Carl Worth <cworth@cworth.org>
Before, we enabled NIR if you set INTEL_USE_NIR to anything which mean that
INTEL_USE_NIR=false would actually turn on NIR. In preparation for turning
NIR on by default, this commit makes it smarter by allowing the
INTEL_USE_NIR variable to work as either a force-enable or a force-disable.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We never reset the string on eglTerminate, so it grows
for ever on multiple eglInitialise.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As VARYING_SLOT_MAX can be bigger than 32.
I'll probably stop building swrast with MSVC in the near future, but this
seems a real bug regardless.
Reviewed-by: Brian Paul <brianp@vmware.com>
program/lex.yy.c and program/program_parse.tab.c is already included in
the PROGRAM_FILES variable.
We still need to specify the dependency relationship though.
Reviewed-by: Brian Paul <brianp@vmware.com>
By default gcc ignores the issue, and as result code that mixes
signed/unsigned is so widespread through the code base that it ends up
being little more than noise, potentially obscuring more pertinent
warnings.
Maybe one day we enable the corresponding gcc warnings and cleanup, but
until then, this change disables them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Removing this block of pragmas doesn't seem to increase the number of
warning generated by MSVC. Other than signed/unsigned comparison warnings
there's very few other warnings nowadays.
Acked-by: Matt Turner <mattst88@gmail.com>
Use the new _glapi_new_nop_table() and _glapi_set_nop_handler() to
improve how we handle calling no-op GL functions.
If there's a current context for the calling thread, generate a
GL_INVALID_OPERATION error. This will happen if the app calls an
unimplemented extension function or it calls an illegal function
between glBegin/glEnd.
If there's no current context, print an error to stdout if it's a debug
build.
The dispatch_sanity.cpp file has some previous checks removed since
the _mesa_generic_nop() function no longer exists.
This fixes the piglit gl-1.0-dlist-begin-end and gl-1.0-beginend-coverage
tests on Windows.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
_glapi_new_nop_table() creates a new dispatch table populated with
pointers to no-op functions.
_glapi_set_nop_handler() is used to register a callback function which
will be called from each of the no-op functions.
Now we always generate a separate no-op function for each GL entrypoint.
This allows us to do proper stack clean-up for Windows __stdcall and
lets us report the actual function name in error messages. Before this
change, for non-Windows release builds we used a single no-op function
for all entrypoints.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
One more case we need to handle. One of the src instructions for the
indirect could also end up being ourself.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)
Discovered by Coverity. Reported by Ilia Mirkin.
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Check if the compiler supports -Werror=vla before using it.
-Wvla was introduced with GCC 4.3 and is not present in 4.2.
Fixes the build on OpenBSD.
v2: Fix statement order, and quote $save_CFLAGS.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89433
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
Currently, we throttle before the user begins preparing commands for the
next frame when we acquire the draw/read buffers. However, construction
of the command buffer can itself take significant time relative to the
frame time. If we move the throttle from the buffer acquire to the
command submit phase we can allow the user to improve concurrency
between the CPU and GPU (i.e. reduce the amount of time we waste inside
the throttle).
v2: Whitespace + delay throttling until after the next submission for
greater parallelism
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
In order to facilitate the concurrency offered by triple buffering and to
offset the latency induced by swapping via an external process, which
may incur extra rendering itself, only throttle to the previous frame
and not the last. The second issue that mostly affects swap benchmarks,
but also can incur jitter in the throttling, is that the throttle bo is
closer to the next SwapBuffers rather than immediately after the previous
SwapBuffers. Throttling to the previous frame doubles the maximum possible
latency at the benefit of improving throughput and reducing jitter.
v2: Rename "first_post_swapbuffer" batches array to a plain
throttle_batch[] as the pluralisation was contorting the name and not
making it clear as to whether it was the first batch or first_post_swap
batch. Not least of which was that not all throttle points are SwapBuffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to an fbo, even though it may be acting as a winsys
frontbuffer or just generally, we never throttle. However, when rendering
to an fbo, there is no natural frame boundary. Conventionally we use
SwapBuffers and glFinish, but potential callers avoid often glFinish for
being too heavy handed (waiting on all outstanding rendering to complete).
The kernel provides a soft-throttling option for this case that waits for
rendering older than 20ms to be complete (that's a little too lax to be
used for swapbuffers, but is here a useful safety net). The remaining
choice is then either never to throttle, throttle after every draw call,
or at after intermediate user defined point such as glFlush and thus all the
implied flushes. This patch opts for the latter as that is the current
method used for flushing to front buffers.
v2: Defer the throttling from inside the flush to the next
intel_prepare_render() and switch non-fbo frontbuffer throttling over to
use the same lax method. The issuing being that
glFlush()/intel_prepare_read() is just as likely to be called inside a
tight loop and not at "frame" boundaries.
v3: Rename from need_front_throttle to need_flush_throttle to avoid any
ambiguity between front buffer rendering and fbo rendering. (Chad)
v4: Whitespace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Shader-db results on HSW:
total instructions in shared programs: 4174156 -> 4157291 (-0.40%)
instructions in affected programs: 145397 -> 128532 (-11.60%)
helped: 383
HURT: 0
GAINED: 20
LOST: 22
There are two more tests lost than gained. However, comparing this with
GLSL IR vs. NIR results, the overall delta is reduced from 85/44
gained/lost on current master to 71/32 with this commit. Therefore, I
think it's probably a boon since we are getting "closer" to where we were
before.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously we tried to do poor-man's copy propagation as we created the
select instructions. Instead, this commit just moves the instructions from
the blocks inside the if into the block before. Copy propagation will take
care of making sure we don't have any extra mov's in there for us.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The code for emitting INTEL_swap_events swap completion
events needs to translate from 32-Bit sbc on the wire to
64-Bit sbc for the events and handle wraparound accordingly.
It assumed that events would be sent by the server in the
order their corresponding swap requests were emitted from
the client, iow. sbc count should be always increasing. This
was correct for DRI2.
This is not always the case under the DRI3/Present backend,
where the Present extension can execute presents and send out
completion events in a different order than the submission
order of the present requests, due to client code specifying
targetMSC target vblank counts which are not strictly
monotonically increasing. This confused the wraparound
handling. This patch fixes the problem by handling 32-Bit
wraparound in both directions. As long as successive swap
completion events real 64-Bit sbc's don't differ by more
than 2^30, this should be able to do the right thing.
How this is supposed to work:
awire->sbc contains the low 32-Bits of the true 64-Bit sbc
of the current swap event, transmitted over the wire.
glxDraw->lastEventSbc contains the low 32-Bits of the 64-Bit
sbc of the most recently processed swap event.
glxDraw->eventSbcWrap is a 64-Bit offset which tracks the upper
32-Bits of the current sbc. The final 64-Bit output sbc
aevent->sbc is computed from the sum of awire->sbc and
glxDraw->eventSbcWrap.
Under DRI3/Present, swap completion events can be received
slightly out of order due to non-monotic targetMsc specified
by client code, e.g., present request submission:
Submission sbc: 1 2 3
targetMsc: 10 11 9
Reception of completion events:
Completion sbc: 3 1 2
The completion sequence 3, 1, 2 would confuse the old wraparound
handling made for DRI2 as 1 < 3 --> Assumes a 32-Bit wraparound
has happened when it hasn't.
The client can queue multiple present requests, in the case of
Mesa up to n requests for n-buffered rendering, e.g., n = 2-4 in
the current Mesa GLX DRI3/Present implementation. In the case of
direct Pixmap presents via xcb_present_pixmap() the number n is
limited by the amount of memory available.
We reasonably assume that the number of outstanding requests n is
much less than 2 billion due to memory contraints and common sense.
Therefore while the order of received sbc's can be a bit scrambled,
successive 64-Bit sbc's won't deviate by much, a given sbc may be
a few counts lower or higher than the previous received sbc.
Therefore any large difference between the incoming awire->sbc and
the last recorded glxDraw->lastEventSbc will be due to 32-Bit
wraparound and we need to adapt glxDraw->eventSbcWrap accordingly
to adjust the upper 32-Bits of the sbc.
Two cases, correponding to the two if-statements in the patch:
a) Previous sbc event was below the last 2^32 boundary, in the previous
glxDraw->eventSbcWrap epoch, the new sbc event is in the next 2^32
epoch, therefore the low 32-Bit awire->sbc wrapped around to zero,
or close to zero --> awire->sbc is apparently much lower than the
glxDraw->lastEventSbc recorded for the previous epoch
--> We need to increment glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch to be one higher than the previous one.
--> Case a) also handles the old DRI2 behaviour.
b) Previous sbc event was above closest 2^32 boundary, but now a
late event from the previous 2^32 epoch arrives, with a true sbc
that belongs to the previous 2^32 segment, so the awire->sbc of
this late event has a high count close to 2^32, whereas
glxDraw->lastEventSbc is closer to zero --> awire->sbc is much
greater than glXDraw->lastEventSbc.
--> We need to decrement glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch back to the previous lower epoch of this late
completion event.
We assume such a wraparound to a higher (a) epoch or lower (b)
epoch has happened if awire->sbc and glxDraw->lastEventSbc differ
by more than 2^30 counts, as such a difference can only happen
on wraparound, or if somehow 2^30 present requests would be pending
for a given drawable inside the server, which is rather unlikely.
v2: Explain the reason for this patch and the new wraparound handling
much more extensive in commit message, no code change wrt. initial
version.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Both of which are no longer used. Use designated initializer to make
things obvious as people add/remove TGSI_OPCODEs.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... rather than the local one in inst_info->tgsi_opcode.
This will allow us to simplify struct r600_shader_tgsi_instruction.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
At the very least, unreal4/sun-temple/102.shader_test uses this pattern
for a signed integer result. However, that shader did not hit the
optimization in the first place because it uses !gl_FrontFacing. I
changed the shader to use remove the logical-not and reverse the other
operands. I verified that incorrect code is generated before this
change and correct code is generated after.
Fixes fs-frontfacing-ternary-1-neg-1.shader_test.
No shader-db changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we check for the case that is actually necessary, the asserts
become superfluous.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Espically on platforms that do not natively generate 0u and ~0u for
Boolean results, we generate a lot of sequences where a CMP is
followed by an AND with 1. emit_bool_to_cond_code does this, for
example. On ILK, this results in a sequence like:
add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F
cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F
and.nz.f0(8) null g3<8,8,1>D 1D
(+f0) iff(8) Jump: 6
The AND.nz is obviously redundant. By propagating the cmod, we can
instead generate
add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F
(+f0) iff(8) Jump: 6
Existing code already handles the propagation from the CMP to the ADD.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs: 3550829 -> 3550788 (-0.00%)
instructions in affected programs: 10028 -> 9987 (-0.41%)
helped: 24
Iron Lake (0x0046):
total instructions in shared programs: 4993146 -> 4993105 (-0.00%)
instructions in affected programs: 9675 -> 9634 (-0.42%)
helped: 24
Ivy Bridge (0x0166):
total instructions in shared programs: 6291870 -> 6291794 (-0.00%)
instructions in affected programs: 17914 -> 17838 (-0.42%)
helped: 48
Haswell (0x0426):
total instructions in shared programs: 5779256 -> 5779180 (-0.00%)
instructions in affected programs: 16694 -> 16618 (-0.46%)
helped: 48
Broadwell (0x162E):
total instructions in shared programs: 6823088 -> 6823014 (-0.00%)
instructions in affected programs: 15824 -> 15750 (-0.47%)
helped: 46
No chage on Sandy Bridge or on any platform when NIR is used.
v2: Add unit tests suggested by Matt. Remove spurious writes_flag()
check on scan_inst when scan_inst is known to be BRW_OPCODE_CMP (also
suggested by Matt).
v3: Fix some comments and remove some explicit int() casts in fs_reg
constructors in the unit tests. Both suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
text data bss dec hex filename
9663 0 0 9663 25bf intel_tiled_memcpy.o before
8215 0 0 8215 2017 intel_tiled_memcpy.o after
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2:-Remove "_mesa" from in front of static software fallback.
-Split out the refactor from the addition of the DSA entry points.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review from Ian Romanick
- Restore VBO_DEBUG and BOUNDS_CHECK
- Remove _mesa from static software fallback unmap_buffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review from Jason Ekstrand
- Split refactor from addition of DSA entry points.
review from Ian Romanick
- Remove "_mesa" from static software fallback map_buffer_range
- Restore VBO_DEBUG and BOUNDS_CHECK
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review by Jason Ekstrand
- Split refactor of clear buffer sub data from addition of DSA entry
points.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
v2: review by Ian Romanick
- Remove "_mesa" from name of static software fallback buffer_sub_data.
- Remove mappedRange from _mesa_buffer_sub_data.
- Removed some cosmetic changes to a separate commit.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
v2: review from Ian Romanick
- Fix space in ARB_direct_state_access.xml.
- Remove "_mesa" from the name of buffer_data static fallback.
- Restore VBO_DEBUG and BOUNDS_CHECK.
- Fix beginning of comment to start on same line as /*
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
This reverts commit 1ee000a0b6.
Failures with the GLES3 conformance suite and Synmark2 OGLHdrBloom revealed
that this commit was in error.
Extensive testing with Piglit prior to patch review and upstreaming did not
reveal this problem because, in the few Piglit tests that test for cube
completeness, NumLayers = 6. This is because all of the existing tests use
TextureStorage to initialize the texture, which sets NumLayers.
A new Piglit test has been sent to the mailing list that reproduces the bug
related to this patch ("texturing: Testing
glGenerateMipmap(GL_TEXTURE_CUBE_MAP) without glTexStorage2D").
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 0ac4c27275 made it add a header for the send message when
using SIMD4x2 on Skylake because without this it will end up using
SIMD8D. However the patch missed the case when a sampler is being used
to implement constant loads from a buffer surface in a SIMD4x2 vertex
shader.
This fixes 29 Piglit tests, mostly related to the ARL instruction in
vertex programs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit bb33a31 introduced optimizations that transform cases of MAD
in to simpler forms but it did not take in to account that src[0]
can not be immediate and did not report progress. Patch switches
src[0] and src[1] if src[0] is immediate and adds progress
reporting. If both sources are immediates, this is taken care of by
the same opt_algebraic pass on later run.
v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Before this actually ran into an infinite loop printing out "invalid"...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Remove the forward declaration and make use of the DEBUG_PRINT macro for
debug builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously PTHREAD_MUTEX_RECURSIVE_NP had been used on linux for
compatibility with old glibc. Since mesa defines __GNU_SOURCE__
on linux PTHREAD_MUTEX_RECURSIVE is also available since at least
1998. So we can unconditionally use the portable version
PTHREAD_MUTEX_RECURSIVE.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88534
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: Don't use the intrinsics, the shader backend can recognize these
patterns and generates optimal code automatically.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
IvyBridge and Haswell PRM say that the JIP should be emitted
with type W but we were using UD. The previous implementation
did not show adverse effects, but IMHO it is safer to follow
the specification thoroughly.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
This requires enabling the optional GL provoking vertex behavior for quads.
+ some cosmetic changes, so that the register is set exactly the same as
on r600.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This will be used for line and polygon smoothing.
This is GCN-only even though it's in shared code.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.
In the isel DAG, lowered BFE has 27 nodes (including leafs).
Now that piglit is no longer falling back to old compiler for any tests,
we can remove it. Hurray \o/
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Deadlock can occur if we schedule an address register write, yet some
instructions which depend on that address register value also depend on
other unscheduled instructions that depend on a different address
register value. To solve this, before scheduling an address register
write, ensure that all the other dependencies of the instructions which
consume this address register are already scheduled.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Create a backend_inst::is_commutative() method to replace two static
functions that did the exact same thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is unfortunately sometimes necessary due to rebasing levels when
rendering into them.
16 piglits crash -> pass, when building mesa with debug enabled.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).
This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test
bin/texelFetch fs sampler2DArray
have the same breakage as its non-array version instead of being
completely off, and makes
bin/ext_texture_array-gen-mipmap
start passing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
The ir_unop_any problem was discovered by some later optimization passes
that generate ir_triop_csel. I was also able to reproduce it by
modifying the gl-2.0-vertexattribpointer vertex shader to generate its
result using
color = mix(vec4(0, 1, 0, 0),
vec4(1, 0, 0, 0),
bvec4(any(greaterThan(diff, vec4(tolerance)))));
instead of an if-statement. This also required using #version 130 and
MESA_GLSL_VERSION_OVERRIDE=130.
I have not nominated this for stable releases because I don't think
there's any way to trigger the problem without GLSL 1.30 or
optimizations that don't exist in stable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@intel.com>
Most of the brw_inst_* api returns 64bit values. This fixes disassembly
of sampler messages, etc.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to get warnings from GCC when we mess up the format
strings.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ARB_shading_language_packing is part of GLSL 4.2, not 4.0 as I
mistakenly believed. The following functions are available only with
ARB_shading_language_packing, GLSL 4.2 (not GLSL 4.0), or ES 3.0:
- packSnorm2x16
- unpackSnorm2x16
- packHalf2x16
- unpackHalf2x16
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Creating/recreating the strings in eglQueryString() is extra work and
isn't thread-safe, as exhibited by shader-db's run.c using libepoxy.
Multiple threads in run.c call eglReleaseThread() around the same time.
libepoxy calls eglQueryString() to determine whether eglReleaseThread()
exists, and our EGL implementation passes a pointer to the version
string to libepoxy while simultaneously overwriting the string, leading
to a failure in libepoxy.
Moreover, the EGL spec says (emphasis mine):
"eglQueryString returns a pointer to a *static*, zero-terminated string"
This patch moves some auxiliary functions from eglmisc.c to eglapi.c so
that they may be used to create the extension, API, and version strings
once during eglInitialize(). The auxiliary functions are renamed from
_eglUpdate* to _eglCreate*, and some checks made unnecessary by calling
the functions from eglInitialize() are removed.
Reviewed-by: Chad Versace <chad.versace@intel.com>
This patch adds two types of checks to the gl(Compressed)Tex(Sub)Imgage family
of functions when a pixel buffer object is bound to GL_PIXEL_UNPACK_BUFFER:
- That the buffer is not mapped.
- The total data size is within the boundaries of the buffer size.
It does so by calling auxiliary validations functions from PBO API:
_mesa_validate_pbo_source() for non-compressed texture calls, and
_mesa_validate_pbo_source_compressed() for compressed texture calls.
The first check is defined in Section 6.3.2 'Effects of Mapping Buffers
on Other GL Commands' of the GLES 3.1 spec, page 57:
"Any GL command which attempts to read from, write to, or change the
state of a buffer object may generate an INVALID_OPERATION error if all
or part of the buffer object is mapped. However, only commands which
explicitly describe this error are required to do so. If an error is not
generated, using such commands to perform invalid reads, writes, or
state changes will have undefined results and may result in GL
interruption or termination."
Similar wording exists in GL 4.5 spec, page 76.
In the case of gl(Compressed)Tex(Sub)Image(2,3)D, the specification doesn't force
implemtations to throw an error. However since Mesa don't currently implement
checks to determine when it is safe to read/write from/to a mapped PBO, we
should always return the error if all or parts of it are mapped.
The 2nd check is defined in Section 8.5 'Texture Image Specification' of the
OpenGL 4.5 spec, page 203:
"An INVALID_OPERATION error is generated if a pixel unpack buffer object
is bound and storing texture data would access memory beyond the end of
the pixel unpack buffer."
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Internal PBO functions such as _mesa_map_validate_pbo_source() and
_mesa_validate_pbo_compressed_teximage() perform validation and buffer mapping
within the same call.
This patch takes out the validation into separate functions to allow reuse
of functionality by other code (i.e, gl(Compressed)Tex(Sub)Image).
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
_mesa_validate_pbo_access() provides a generic way to check that a
requested pixel transfer operation on a PBO falls within the
boundaries of the buffer. It is used in various other places, and
depending on the caller, some arguments are used or not.
In particular, the 'clientMemSize' argument is used only by calls
that are knowledgeable of the total size of the user data involved
in a pixel transfer, such as the case of compressed texture image
calls. Other calls don't provide 'clientMemSize' directly since it
is made implicit from the size and format of the texture, and its
data type. In these cases, a sufficiently big value is passed to
'clientMemSize' (INT_MAX) to avoid an incorrect constrain.
The problem is that _mesa_validate_pbo_access() use uint
pointers to make the calculations, which are 64 bits long in 64
bits platforms, meanwhile the dummy INT_MAX passed in 'clientMemSize'
is just 32 bits. This causes a constrain that is not desired.
This patch fixes that by checking that if 'clientMemSize' is MAX_INT,
then UINTPTR_MAX is assumed instead.
This is an ugly workaround to the fact that _mesa_validate_pbo_access()
intends to be a one function fits all. The clean solution here would
be to break it into different functions that provide the adequate API
for each of the possible code paths and validation needs.
Since there are callers relying on passing INT_MAX to 'clientMemSize',
this patch is necessary to deal with the problem above while a cleaner
implementation of the PBO API is not implemented.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
The implementation of texture <-> pixel-buffer transfers in drivers common layer
includes certain error checks and argument validation that don't belong there,
considering how the Mesa codebase is laid out. These are higher level
validations that, if necessary, should be performed earlier (i.e, in GL API
entry points).
This patch simply removes these error checks from driver code.
For more information, see discussion at
http://lists.freedesktop.org/archives/mesa-dev/2015-February/077417.html.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
eglcurrent.c: In function '_eglSetTSD':
eglcurrent.c:57:4: warning: passing argument 2 of 'tss_set' discards
'const' qualifier from pointer target type [enabled by default]
tss_set(_egl_TSD, (const void *) t);
^
In file included from ../../../include/c11/threads.h:72:0,
from eglcurrent.c:32:
../../../include/c11/threads_posix.h:357:1: note: expected 'void *'
but argument is of type 'const void *'
tss_set(tss_t key, void *val)
^
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The memory layout of compatible internal formats may differ in bytes per
block, so TexFormat is not a reliable measure of compatibility. For example,
GL_RGB8 and GL_RGB8UI are compatible formats, but GL_RGB8 may be laid out in
memory as B8G8R8X8. If GL_RGB8UI has a 3 byte-per-block memory layout, the
existing compatibility check will fail.
Additionally, the current check allows any two compressed textures which share
block size to be used, whereas the spec gives an explicit table of compatible
formats.
v2: Use a switch instead of array iteration for block class and show the
correct GL error when internal formats are mismatched.
v3: Include spec citations for new compatibility checks, rearrange check
order to ensure that compressed, view-compatible formats return the
correct result, and make style fixes. Original commit message amended
for clarity.
v4: Reformatted spec citations.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously, we stored derefs in a hash table, using the malloc'd pointer
as the key. Then, we walked through the hash table and generated code,
based on the order of the hash table's elements.
Memory addresses returned by malloc are pretty much random, which meant
that the hash was random, and the hash table's elements would be walked
in some random order. This led to successive compiles of the same
shader using different variable names and slightly different orderings
of phi-nodes. Code could not be diff'd, and the final assembly would
sometimes change slightly too.
It turns out the only point of the hash table was to avoid inserting
the same node multiple times for different dereferences. We never
actually searched the hash table! This patch uses an intrusive
linked list instead. Since exec_list uses head and tail sentinels,
checking prev or next against NULL will tell us whether the node is
already in the list.
Pair programming with Jason Ekstrand.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
__next and __prev are pointers to the structure containing the exec_node
link, not the embedded exec_node. NULL checks would fail unless the
embedded exec_node happened to be at offset 0 in the parent struct.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Use "(__node)->__field.next != NULL" to check for the end of the list
instead of the "&__next->__field != NULL". The former is far more
obviously correct as it matches what the non-safe versions do. The
original code tried to avoid any use of __next as the client code may
delete it during its execution. However, since the looping condition is
checked after the iteration clause but before the client code is
executed, we know that __node is valid during the looping condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(Jason noted that this is not a good long term solution, and we should
instead improve nir_lower_io so that this extra set of MOVs is
unnecessary. I tend to agree, but decided we could do that as a
follow-up improvement.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No functional change. In preparation for supporting vertex shaders,
this adds a switch statement on shader stage (since vertex attributes
and fragment shader varyings will need different handling). It also
renames "varying" to "input", to be more general.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ian and I added these around the time Connor was developing NIR. Now
that both exist, we should make them work together!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can't safely call nir_optimize() with register present, since several
passes called in the loop can't handle registers, and will fail asserts.
Notably, nir_lower_vec_alus() and nir_opt_algebraic() really don't want
registers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Array variable copy splitting generates a bunch of stuff we want to
clean up before proceeding.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR backend hardcodes brw_wm_prog_key at the moment, which won't
work when we support scalar VS. We could use get_tex(), but it's a
static method. I was going to promote it to fs_visitor, but then
realized that both parameters (stage and key) are already members.
It then occured to me that we could just set up a pointer in the
constructor, and skip having a function altogether.
This patch also converts all existing users to use key_tex.
v2: Make key_tex a "const brw_sampler_prog_key_data *" instead of
non-const; word-wrap some lines. (Review comments from Topi.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_BLENDAPI boils down to __cdecl on Windows, but __cdecl is the default
calling convention so this serves no purpose.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By passing --force autoreconf will update all the aux files, which would
otherwise be ignored if one updates autoconf/automake.
Quote the ORIGDIR variable to prevent fall-outs, when its name contains
space.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Implicitly required for a while, although commit 9385c592c6 (mapi:
remove u_thread.h) was the one that put the final nail on the
coffin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Left over from commit 18db13f5865(mapi: THREADS was always defined,
remove it)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This has been an implicit rule for building mesa for a long time. Let's
make it official and just bail out at configure time. This way we can
cleaning up some of our glx code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Convert the code to use the C11 threads implementation, and nuke the
Windows non-pthreads code-path. The c11/threads_win32.h abstraction
should be better than the current code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul review suggestion: there's more macro use here than necessary.
Removed and redefine some #define preprocessing directives.
Removed the directive input parameter 'T' .
No functional changes.
Signed-off-by: Marius Predut <marius.predut@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were already using strdup() in various places in Mesa. Get rid
of the _mesa_strdup() wrapper. All the callers pass a non-NULL
argument so the NULL check isn't needed either.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed
to do an out of bounds access into an uniform array to make sure that we
handle that situation gracefully inside the driver, however, as Ken describes
in bug 79202, Valgrind reports that this is leading to an out-of-bounds access
in fs_visitor::demote_pull_constants().
Before accessing the pull_constant_loc array we should make sure that
the uniform we are trying to access is valid.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_imm_ud(0xffff) should have been converted to fs_reg(0xffffu) to
make sure the uint32_t fs_reg constructor was matched.
commit 49a938a265
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Fri Feb 20 12:12:25 2015 -0800
i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We now skip allocating a hiz miptree for gen8. Instead, we calculate
the required hiz buffer parameters and allocate a bo directly.
v2:
* Update hz_height calculation as suggested by Topi
v3:
* Bail if we failed to create the bo (Ben)
v4:
* CEILING => DIV_ROUND_UP
* Make sure mt->logical_depth0 being 0 would not cause trouble
* Fail if Y tiling is not returned
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67564
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We now skip allocating a hiz miptree for gen7. Instead, we calculate
the required hiz buffer parameters and allocate a bo directly.
v2:
* Update hz_height calculation as suggested by Topi
v3:
* Bail if we failed to create the bo (Ben)
v4:
* CEILING => DIV_ROUND_UP
* Make sure mt->logical_depth0 being 0 would not cause trouble
* Fail if Y tiling is not returned
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67564
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We are still allocating a miptree for hiz, but we only use fields from
intel_miptree_aux_buffer. This will allow us to switch over to not
allocating a miptree.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We are still allocating a miptree for hiz, but we only use fields from
intel_miptree_aux_buffer. This will allow us to switch over to not
allocating a miptree.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Today we allocate a miptree's for the hiz buffer. We needed this in
the past because we would point the hardware at offsets of the hiz
buffer. Since the hiz format is not documented, this is not a good
idea.
Since moving to support layered rendering on Gen7+, we no longer point
at an offset into the buffer on Gen7+.
Therefore, to support hiz on Gen7+, we don't need a full miptree
structure allocated.
This patch starts to create a new auxiliary buffer structure
(intel_miptree_aux_buffer) that can be a more simplistic miptree
side-band buffer associated with a miptree. (For example, to serve the
needs of the hiz buffer.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The maximum value of a Gallium HUD's panel is automatically adjusted
when the current value is greater than the max. If we set the
pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is
never adjusted and this results in a flat line instead of a pretty curve
which is correctly scaled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_shader.cpp: In function ‘bool brw_saturate_immediate(brw_reg_type, brw_reg*)’:
brw_shader.cpp:618:31: warning: ‘sat_imm.brw_saturate_immediate(brw_reg_type, brw_reg*)::<anonymous union>::ud’ may be used uninitialized in this function [-Wmaybe-uninitialized]
reg->dw1.ud = sat_imm.ud;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
i915_fragprog.c: In function ‘i915ValidateFragmentProgram’:
i915_fragprog.c:1453:11: warning: variable ‘k’ set but not used [-Wunused-but-set-variable]
int k;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It looks like this has existed since
commit f5a477ab76
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Mon Dec 16 11:54:08 2013 -0800
meta: Refactor shader generation code out of mipmap generation path
Valgrind was complaining on fbo-generatemipmap-formats
v2: Instead, do the allocation after the early return block (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We used to loop over all color attachments, and emit FB writes for each
one, even if the shader didn't write to a corresponding output variable.
Those color attachments would be filled with garbage (undefined values).
Football Manager binds a framebuffer with 4 color attachments, but draws
to it using a shader that only writes to gl_FragData[0..2]. This meant
that color attachment 3 would be filled with garbage, resulting in
rendering artifacts. Now we skip writing to it, fixing rendering.
Writes to gl_FragColor initialize outputs[0..nr_color_regions-1] to
GRFs, while writes to gl_FragData[i] initialize outputs[i].
Thanks to Jason Ekstrand for tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86747
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Previously, we emitted the shader-time epilogue from emit_fb_writes(),
during the middle of looping through color regions (or emit_urb_writes
for the VS). This is duplicated several times and rather awkward.
I need to fix a bug in our FB write handling, and it will be a lot
easier if we move emit_shader_time_end() out of there.
Now, we simply emit FB writes/URB writes, and subsequently have
emit_shader_time_end() insert instructions before the final SEND with
EOT. Not only is this simpler, it's actually a slight improvement:
we now include the MOVs to set up the final FB write payload in our
shader-time measurements.
Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses
send-from-GRF. (In the past, we might have hit trouble where both
attempt to use MRFs for messages; that's not a problem now.)
v2: Rebase on v3 of the previous patch and other shader_time fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
This makes another part of the INTEL_DEBUG=shader_time code emittable
at arbitrary locations, rather than just at the end of the instruction
stream.
v2: Don't lose smear! Caught by Topi Pohjolainen.
v3: Don't set smear on the destination of the MOV. Thanks Topi!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)).
The advantage is that we can also insert a shader time write at an
arbitrary location in the instruction stream, rather than being
restricted to emitting at the end.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Lets define R600_MAX_VIEWPORTS instead of using 16 here and there
in the code when looping through viewports and scissors. It is
easier to understand what this number represents.
v2: Missed a case where R600_MAX_VIEWPORTS should have been used.
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Both the AMD and Intel APIs provide a dataSize parameter, and this
function would merrily ignore it. Neither API specifies what to do when
the buffer isn't big enough. I take the easy route of writing all the
complete bits of data that will fit. With more complete specs, we could
probably do something different.
I noticed this while looking into an unused parameter warning. The
warning was actually useful!
brw_performance_monitor.c: In function 'brw_get_perf_monitor_result':
brw_performance_monitor.c:1261:37: warning: unused parameter 'data_size' [-Wunused-parameter]
GLsizei data_size,
^
v2: Fix checks to include offset in the calculation. Noticed by Jan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
All dd functions take a gl_context as the first parameter. Instead of
removing it, just silence the warning.
brw_performance_monitor.c: In function 'brw_new_perf_monitor':
brw_performance_monitor.c:1354:41: warning: unused parameter 'ctx' [-Wunused-parameter]
brw_new_perf_monitor(struct gl_context *ctx)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
What a useful warning. #ThanksGCC
brw_performance_monitor.c:153:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen5_raw_chaps_counters[] = {
^
brw_performance_monitor.c:185:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen5_oa_snapshot_layout[] =
^
brw_performance_monitor.c:221:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen5_groups[] = {
^
brw_performance_monitor.c:240:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_raw_oa_counters[] = {
^
brw_performance_monitor.c:281:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_oa_snapshot_layout[] =
^
brw_performance_monitor.c:317:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_statistics_counters[] = {
^
brw_performance_monitor.c:332:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_statistics_register_addresses[] = {
^
brw_performance_monitor.c:346:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen6_groups[] = {
^
brw_performance_monitor.c:356:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_raw_oa_counters[] = {
^
brw_performance_monitor.c:402:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_oa_snapshot_layout[] =
^
brw_performance_monitor.c:470:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_statistics_counters[] = {
^
brw_performance_monitor.c:493:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_statistics_register_addresses[] = {
^
brw_performance_monitor.c:515:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen7_groups[] = {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I don't this opt_cmod_propagation_local ever used the fs_visitor.
brw_fs_cmod_propagation.cpp:52:40: warning: unused parameter 'v' [-Wunused-parameter]
opt_cmod_propagation_local(fs_visitor *v, bblock_t *block)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Review by Martin Peres
- Get rid of difficult-to-follow code copied and pasted from
the original TexBufferRange
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Creates a shared function to ensure that texture buffer target is
GL_TEXTURE_BUFFER. Helps to clean up the Tex[ture]Buffer[Range] functions.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Creates a shared function that TexBufferRange and TextureBufferRange can use
to check the buffer range. This cleans up TexBufferRange considerably.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Adds a useful comment and some whitespace. Fixes an error message.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Changes how the caller is identified in error messages, moves a check for
ARB_texture_buffer_object from the entry points to the shared code in
_mesa_texture_buffer_range, and removes an unused argument (GLenum target).
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
- Closing curly brace on the same line as else
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This function is exposed to mesa driver internals so that texture buffer
objects and array objects can use it.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
ARB_direct_state_access functions that deal with texture cube
maps need to make sure that texture images are not NULL before operating on
them. In the following cases, the error check functions already throw an
error if texImage == NULL, so an assert can be raised instead.
v2: Review from Anuj Phogat
- Replace redundant "if (!texImage) return;" statements with
assert(texImage)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The comment describing why ARB_direct_state_access texture cube map functions
use _mesa_cube_level_complete is very long. To save room in the files,
readers are now referred to one central comment on texturesubimage in
teximage.c.
v2: Review from Anuj Phogat
- Remove redundant copies of the cube map block comment
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
ARB_direct_state_access texture functions that operate on cube maps no longer
need to verify that cube map texture objects contain six texture images
because _mesa_cube_level_complete now does that for them.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
_mesa_cube_level_complete now verifies that a cube map texture object actually
has six texture images before proceeding.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01
test as well as the precision part of Wine's 3dc format test (fd.o bug
89156).
The Z component seems to contain a lower precision version of the
result, probably a temporary value from the decompression computation.
The Y and W component contain different data that depends on the input
values as well, but I could not make sense of them (Not that I tried
very hard).
GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in
piglit, and both formats are affected by a compiler bug if they're
sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx,
which returns random garbage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
This means dropping CL_FP_DENORM from the current return value.
v2:
- Add comments about minimum values for OpenCL 1.2.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
From the SNB PRM, volume 4, part 1, page 193:
"The dual source render target messages only have SIMD8 forms due to
maximum message length limitations. SIMD16 pixel shaders must send two of
these messages to cover all of the pixels. Each message contains two colors
(4 channels each) for each pixel in the message payload."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Vertex shaders can have shader inputs where location happens to be
VARYING_SLOT_FACE. Without predicating this on the shader stage,
we suddenly end up with load_front_face intrinsics in vertex shaders,
which is nonsensical.
Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The next commit needs to know the shader stage in glsl_to_nir().
To facilitate that, we pass the gl_shader rather than the raw exec_list
of instructions. This has both the exec_list and the stage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers. Presumably other passes may as well.
Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The code in glsl_to_nir is entirely dead, as we translate from GLSL to
NIR at link time, when there isn't a _mesa_glsl_parse_state to pass,
so every caller passes NULL.
glsl_to_nir seems like the wrong place to try and create the shader
compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other
translators all would have to duplicate that code. The driver should
set this up once with whatever settings it wants, and pass it in.
Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[]
and left a comment saying: "The memory for the options is expected to be
kept in a single static copy by the driver." This suggests the plan was
to do exactly that. That pointer was not marked const, however, and the
dead code used a mix of static structures and ralloced ones.
This patch deletes the dead code in glsl_to_nir, instead making it take
the shader compiler options as a mandatory argument. It creates an
(empty) options struct in the i965 driver, and makes NirOptions point
to that. It marks the pointer const so that we can actually do so
without generating "discards const qualifier" compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing actually uses these, and the only caller of glsl_to_nir()
(brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state
pointer, meaning they'll always be NULL and 0, respectively.
Just delete them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/
array-selection.vert test contains the following code:
gl_Position = (pick_from_a_or_b ? a : b)[i];
where "a" and "b" are uniform vec4[2] variables.
ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and
generates an if-block to copy one or the other:
(declare (temporary) (array vec4 2) conditional_tmp)
(if (var_ref pick_from_a_or_b)
((assign () (var_ref conditional_tmp) (var_ref a)))
((assign () (var_ref conditional_tmp) (var_ref b))))
However, we failed to update max_array_access for "a" and "b", so it
remained 0 - here, the whole array is being accessed. At link time,
update_array_sizes() used this bogus information to change the types
of "a" and "b" to vec4[1]. We then had assignments from a vec4[1] to
a vec4[2], which is highly illegal.
This tripped assertions in nir_split_var_copies with scalar VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
negate changes meaning to bitwise-not (~, not -). This isn't what NIR
expects, so we should resolve the source modifers via a MOV.
+30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
this isn't hooked up to anything at all from what I can see.
Seems like a left over from commit 5d67d4fbebb(st/mesa: remove
st_TexImage(), use core Mesa code instead).
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that relative-dst works, we should never fall back to the old
compiler. (Which is almost true, other than a couple edge case sched
fails in piglit).
So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with
a glsl120 flag to force GLSL 120 and !integers.
If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a
workaround and please let me know.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enable the 'sphinx.ext.graphviz' extension, and add in a section for
driver specific docs, with freedreno compiler docs beneath. The
goal is for more complete compiler docs, and hopefully some docs about
other parts of the driver (such as how tiling works, etc).
Note that there is also a Distribution -> Drivers section. Although
that appears to be simply just a list of drivers. Not sure if that
should move under the 'Drivers' section or left alone. I did add a
one-line section for freedreno in the existing Distribution -> Drivers
section.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To simplify RA, assign arrays that are written to first. Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The meta-deref instruction doesn't really do what we need for relative
destination. Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address. This lets us express the dependency regardless of
whether it is used for dst and/or src.
The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction. Which is pretty much what
we want, as far as scheduling/etc.
TODO:
For now, the foreach_src{_n} iterators are unchanged. We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way. But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I remembered that we are using c99.. which makes some sugary iterator
macros easier. So introduce iterator macros to iterate all src
registers and all SSA src instructions. The _n variants also return
the src #, since there are a handful of places that need this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage. Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Turns out there are scenarios where we need to insert mov's in "front"
of an input. Triggered by shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in r300 files.
When the helper to detect this issue was pushed to master, it broke
the build for the r300 driver. This patch fixes the r300 build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in nouveau files.
When the helper to detect this issue was pushed to master, it broke
the build for the nouveau driver. This patch fixes the nouveau build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There is no HW support for these and the VBO pusher doesn't know about
them. No need to, either, since the st will be lowering them to 2x32.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a help function for each WA and make PIPE_CONTROL flags match the WA
descriptions. Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs.
Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.
Replace the _MSC_VER >= 1200 with defined (_MSC_VER) and compact if/else
statements. We require MSVC 2008 or later with commit 46110c5d564.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Implicitly required for a while, although commit 9385c592c6 (mapi:
remove u_thread.h) was the one that put the final nail on the
coffin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This has been an implicit rule for building mesa for a long time. Let's
make it official and just bail out at configure time. This way we can
cleaning up some of our glx code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Convert the code to use the C11 threads implementation, and nuke the
Windows non-pthreads code-path. The c11/threads_win32.h abstraction
should be better than the current code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This is just to help repro and fixing these issues with any C++ compiler --
Commiting this will of course wait until all issues are addressed.
$ scons src/glsl/
scons: Reading SConscript files ...
Checking for GCC ... yes
Checking for Clang ... no
Checking for X11 (x11 xext xdamage xfixes glproto >= 1.4.13)... yes
Checking for XCB (x11-xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8)... yes
Checking for XF86VIDMODE (xxf86vm)... yes
Checking for DRM (libdrm >= 2.4.38)... yes
Checking for UDEV (libudev >= 151)... yes
warning: LLVM disabled: not building llvmpipe
scons: done reading SConscript files.
scons: Building targets ...
scons: building associated VariantDir targets: build/linux-x86_64-debug/glsl
Compiling src/glsl/ast_array_index.cpp ...
Compiling src/glsl/ast_expr.cpp ...
Compiling src/glsl/ast_function.cpp ...
Compiling src/glsl/ast_to_hir.cpp ...
Compiling src/glsl/ast_type.cpp ...
Compiling src/glsl/builtin_functions.cpp ...
In file included from include/c99_compat.h:28:0,
from src/mapi/u_compiler.h:4,
from src/mapi/u_thread.h:47,
from src/mapi/glapi/glapi.h:47,
from src/mesa/main/mtypes.h:42,
from src/mesa/main/errors.h:47,
from src/mesa/main/imports.h:41,
from src/mesa/main/core.h:44,
from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
In file included from include/c99_compat.h:28:0,
from include/c11/threads.h:38,
from src/mapi/u_thread.h:49,
from src/mapi/glapi/glapi.h:47,
from src/mesa/main/mtypes.h:42,
from src/mesa/main/errors.h:47,
from src/mesa/main/imports.h:41,
from src/mesa/main/core.h:44,
from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
Compiling src/glsl/builtin_types.cpp ...
Compiling src/glsl/builtin_variables.cpp ...
scons: *** [build/linux-x86_64-debug/glsl/builtin_functions.os] Error 1
scons: building terminated because of errors.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
If scratch space is needed for a shader stage we try to reuse the last scratch
buffer bound to that stage. If we can't, we free the old scratch buffer and
allocate a new one. This means we always keep the last scratch buffer for a
particular shader stage around for the entire life span of the context.
These buffers are being reported by Valgrind as definitely lost after
destroying the OpenGL context. For example, for the geometry shader stage:
==18350== 248 bytes in 1 blocks are definitely lost in loss record 85 of 150
==18350== at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18350== by 0xA1B35D6: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:724)
==18350== by 0xA1B383F: drm_intel_gem_bo_alloc (intel_bufmgr_gem.c:794)
==18350== by 0xA1AEFA3: drm_intel_bo_alloc (intel_bufmgr.c:52)
==18350== by 0x9D08E31: brw_get_scratch_bo (brw_program.c:226)
==18350== by 0x9D2A0F2: do_gs_prog (brw_vec4_gs.c:280)
==18350== by 0x9D2A635: brw_gs_precompile (brw_vec4_gs.c:401)
==18350== by 0x9D14F68: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_shader.cpp:76)
==18350== by 0x9D157B8: brw_link_shader (brw_shader.cpp:269)
==18350== by 0x9B0941E: _mesa_glsl_link_shader (ir_to_mesa.cpp:3038)
==18350== by 0x99AE4ED: link_program (shaderapi.c:917)
==18350== by 0x99AF365: _mesa_LinkProgram (shaderapi.c:1385)
So make sure that by the time we destroy the context we check if we have live
scratch buffers for the various stages and release them if that is the case.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This output variables gives more flexibility for future changes
in autoconf to detect if it is needed to auto-generate files and
check for the auto-generation dependencies.
It is still returning error when Python is not installed.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Was resulting in gl_PointSize write being optimized out, causing
particle system type shaders to hang if hw binning enabled.
Fixes neverball, OGLES2ParticleSystem, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use common intrastage array validation for interface blocks.
This change also allows us to support interface blocks
that are arrays of arrays.
V2: Reinsert unsized array asserts in interstage_match()
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
A while back I switched intel_blit_framebuffer to prefer Meta over the
BLT. This meant that Gen8 platforms would start using the 3D engine
for blits, just like we do on Gen6-7.5.
However, I hadn't considered Gen4-5 when making that change. The BLT
engine appears to be substantially faster on 965GM than using Meta to
drive the 3D engine. This isn't too surprising: original Gen4 doesn't
support tile offsets (that came on G45), and the level/layer fields
don't work for cubemap rendering, so for inconvenient miplevel
alignments, we end up blitting or copying data to/from temporaries
in order to render to it. We may as well just use the blitter.
I chose to use the BLT on Gen4-5 because they use the same ring for
both 3D and BLT; Gen6+ splits it out.
Fixes regressions on 965GM due to botched tile offset code (we should
fix those properly as well, but they're longstanding bugs - for now,
put things back to the status quo).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
The former is used by the kernel driver to set up fence registers and to pass
tiling info across processes. It lacks INTEL_TILING_W, which made our code
less expressive.
System headers may contain C++ declarations, which cannot be given C
linkage. For this reason, include statements should never occur
inside extern "C".
This patch moves the C linkage statements to enclose only the
declarations within a single header.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The SSSE3 swizzling code was written for fast uploads to the GPU and
assumed the destination was always 16-byte aligned. When we began using
this code for fast downloads as well we didn't do anything to account
for the fact that the destination pointer given by glReadPixels() or
glGetTexImage() is not guaranteed to be suitably aligned.
With SSSE3 enabled (at compile-time), some applications would crash when
an SSE aligned-store instruction tried to store to an unaligned
destination (or an assertion that the destination is aligned would
trigger).
To remedy this, tell intel_get_memcpy() whether we're uploading or
downloading so that it can select whether to assume the destination or
source is aligned, respectively.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89416
Tested-by: Uriy Zhuravlev <stalkerg@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Several patches added include statements where required by the m64
build. Some files are only compiled for m32, and require similar
changes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2:
- Report correct values for CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE
and CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE.
- Only define cl_khr_fp64 if the extension is supported.
- Remove trailing space from extension string.
- Rename device query function from cl_khr_fp64() to
has_doubles().
v3:
- Return 0 for device::doubled_fp_confg() when doubles aren't
supported.
v4:
- Remove device query for double fp_config.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The header is included in ../xmlpool.h. With the latter of which used
directly in a number of places in mesa.
Note that we can also add it (alongside t_option.h) to noinst_HEADERS,
but neither solution fixes the issue that brough us here - namely:
Do not regenerate the headers, if it already exists.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I.e. add {shared-,}glapi/glapi_mapi_tmp.h to the SOURCES list. Otherwise
there will be no knowledge that the file is required by others for the
build. Thus autotools won't pick it up for the distribution tarball.
v2: Don't forget about the static glapi. Spotted by Matt.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some of the files generated were not in the SOURCES variable, thus
although generated prior to compilation the dependency tracking was
incomplete. The latter of which resulted in the files missing from the
distribution tarball.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The file is auto-generated, and #included by formats.c. Let's rename it
to reflect the latter. This will also help up fix the dependency
tracking by adding it to the _SOURCES variable, without the side effect
of it being compiled (twice).
v2: Update .gitignore to reflect the rename.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Drop the no longer present get_es{1,2}.c from the list.
v2: Keep the format_info.c rename hunk out of this patch.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All the users directly include the header, plus we have a in-tree
replacements for non C99 compilers which we already use.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently these files are including it indirectly via eglcompiler.h
The latter of which will be removed with follow up commits.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Should no longer be used. As many places indirectly include
eglcompiler.h keep this change separate, so that it can be easily
reverted, if needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With the split of the gallium egl module we had previously it required
access to some of the internal functions. As the only build (automake)
that did this no longer builds it we can now appropriately hide those
functions.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The latter is a C99 standard, and our current wrapper c99_compat.h
should handle non-compliant compilers.
Drop the c99_compat.h inclusion from eglcompiler.h altogether, as it's
no longer required.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drop the custom keyword in favour of the C99 one. All the places using
it now directly include c99_compat.h which should handle things on
platforms which lack it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
THREADS was defined if HAVE_PTHREADS or _WIN32 was defined. That's
always the case. The build would die in c11/threads.h otherwise.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Remove u_thread_self() since u_thread.h is going away soon.
Create a simple thread ID abstraction which wraps WIN32 or c11 threads.
This also gets rid of the questionable casting of thrd_t to an unsigned
long.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
So it matches the preprocessor check around the u_current_init_tsd() code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of relying on glapi.h or some other header to provide it.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Let's directly include c11/threads.h instead of relying on glapi.h
to provide it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The yoffset needs to be interpreted as a slice offset for 1D array
textures. This patch implements that by moving the yoffset into
zoffset similar to how it moves the height into depth.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Now that a layered source PBO is interpreted as a single tall 2D image
it's quite easy to accept the image height packing option by just
creating an image that is tall enough to include the image padding.
I'm not sure whether the image height property should affect 1D_ARRAY
textures. My intuition and interpretation of the GL spec (which is a
bit vague) would be that it shouldn't. However the software fallback
path in Mesa uses the property for packing but not for unpacking. The
binary NVidia driver uses it for both. This patch doesn't use it for
either case so it is different from the software fallback. There is
some discussion about this here:
http://lists.freedesktop.org/archives/mesa-dev/2015-February/077925.html
This is tested by the texsubimage Piglit test with the array and pbo
arguments. Previously this test was skipping this code path because it
always sets the image height.
I've also tested it by modifying the getteximage-targets test. It
wasn't using this code path before because it was using the default
texture object so this code couldn't successfully create a frame
buffer. I also modified it to add some image padding with the image
height in the PBO.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
This reverts commit 546aba143d.
I think the changes to the calls to glBlitFramebuffer from this patch
are no different to what it was doing previously because it used to
set height to 1 before doing the blits. However it was introducing
some problems with the blit for layer 0 because this was no longer
special cased. It didn't fix problems with the yoffset which needs to
be interpreted as a slice offset. I think a better solution would be
to modify the original if statement to cope with the yoffset.
Conflicts:
src/mesa/drivers/common/meta_tex_subimage.c
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Add wrappers for 3DPRIMITIVE to make sure we clear current_pipe_control_dw1
and deferred_pipe_control_dw1 after it. Add missing
gen7_wa_post_ps_and_later().
These WAs
gen7_wa_post_3dstate_push_constant_alloc_ps()
gen7_wa_pre_vs()
gen7_wa_pre_3dstate_sf_depth_bias()
first half of gen7_wa_pre_depth()
gen7_wa_post_ps_and_later()
are Gen7-specific. Update copy-and-pasted gen8_wa_pre_depth() also.
Add intel_winsys_get_reset_stats(), intel_winsys_import_userptr(), and
intel_bo_map_async(). The latter two are stubs, but we are not going to use
them immediately either.
DRM_IOCTL_I915_GEM_WAIT takes an int64_t for the timeout value but
GL_ARB_sync takes an uint64_t. Further, the ioctl used to wait
indefinitely when passed a negative timeout, but it's been broken and
now returns immediately in that case. Thus, if an application passes
UINT64_MAX to wait forever, we overflow to -1LL and return immediately.
Work around this mess by clamping the wait timeout to INT64_MAX.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Define the macro in src/util/macros.h rather than in two different
places. Note that USED isn't actually used anywhere at this time.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
OpenVG API seems to have dwindled away. The code
would still be interesting if we wanted to implement NV_path_rendering
but given the trend of the next gen graphics APIs, it seems
unlikely that this becomes ARB or core.
v2: Remove a few "openvg" references left, per Emil Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Update release notes.
Largely superseeded by src/egl, and
WGL/GLX_EXT_create_context_es_profile extensions.
Note this will break Android.mk with gallium drivers -- somebody
familiar with that build infrastructure will need to update it to use
gallium drivers through egl_dri2.
v2: Remove the _EGL_BUILT_IN_DRIVER_GALLIUM define from
src/egl/main/Android.mk; and update the src/egl/main/Sconscript to
create a SharedLibrary, add versioning, create symlink - copy the bits
from egl-static, per Emil Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Disallow undefined symbols in libEGL.so. Update release notes
This classic driver is so far behind Gallium softpipe/llvmpipe based
one, that's hard to imagine ever being useful.
v2: Drop drivers/windows from src/mesa/Makefile.am:EXTRA_DIST per Emil
Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Update release notes.
v2:
- Single statement, by using memset return value as suggested by Ian
Romanick.
- No internal declaration, as suggested by Jason Ekstrand.
- Move macros to a header.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since commit 28f3f8d, indices generator take a start parameter. However, some
index values have been left to start at 0.
This fixes the glean/fbo test with the virgl driver, and copytexsubimage
with freedreno.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Fix GCC cpp warnings with glibc >= 2.19.
/usr/include/features.h:148:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
# warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Correctly set _BaseFormat field when creating a gl_renderbuffer
with EGLImage storage.
Change-Id: I8c9f7302d18b617f54fa68304d8ffee087ed8a77
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Re-enable integer, now that we can handle flat varyings. Still, ofc,
conditional on FD_MESA_DEBUG=glsl130, until we can deprecate _old
compiler..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We may not need this for later a4xx patchlevels, but we do at least need
this for patchlevel 0. Bypass bary.f for fetching varyings when flat
shading is needed (rather than configure via cmdstream). This requires
a special dummy bary.f w/ (ei) flag to signal to scheduler when all
varyings are consumed. And requires shader variants based on rasterizer
flatshade state to handle TGSI_INTERPOLATE_COLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I think there is at least one more sub-encoding, but these two should be
enough to cover the common load/store instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To lower two sided color, tgsi_lowering creates additional BCOLOR inputs
(matching up to the BCOLOR outputs on the vert shader). These inputs
should copy the interpolation state of their matching COLOR input.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
fd3_emit.c: In function ‘fd3_emit_vertex_bufs’:
fd3_emit.c:377:11: warning: unused variable ‘semantic’ [-Wunused-variable]
uint8_t semantic = sem2name(vp->inputs[i].semantic);
and
fd4_emit.c: In function ‘fd4_emit_vertex_bufs’:
fd4_emit.c:304:11: warning: unused variable ‘semantic’ [-Wunused-variable]
uint8_t semantic = sem2name(vp->inputs[i].semantic);
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.
This is achieved by using the appropriate -Werror= options only on the
places they need to be used.
Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies). I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While using various debugging features (optimization debug, instruction dumping,
etc) this function is called in order to get a readable letter for the type of
unit.
On GEN8, two new units were added, the Qword and the Unsigned Qword (Q, and UQ
respectively). The existing assertion tries to determine that the argument
passed in is within the correct boundary, however, it was using UQ as the upper
limit instead of Q.
To my knowledge you can only hit this case with the branch I am currently
working on, so it doesn't fix any known issues.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I'm not really sure of the origins of the existing flag names. Modern docs have
some slightly different names. Having the correct names makes it easier to
determine if existing PIPE_CONTROL flag settings are correct, as well as making
adding new PIPE_CONTROLs easier.
This originally came up while I was trying to implement workarounds and spotted
some things called, "flush" which should have been called "invalidate."
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a fix for a regression introduced in commit a9f8296d ("i965/fs:
Preserve the CFG in a few more places.").
The errata this code works around is described in a comment before the function:
"[DevBW, DevCL] Errata: A destination register from a send can not be
used as a destination register until after it has been sourced by an
instruction with a different destination register.
The framebuffer write's sources must be in message registers, which SEND
instructions cannot have as a destination. There's no way for this
errata to affect anything at the end of the program. Just remove the
code.
Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84613
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were bugs where if the app set a scissor it could affect
the area of the texture that was downloaded. There was also potential that
the framebuffer SRGB state could affect downloads. This ensures that those
will get saved/restored and can't affect the texture download.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89292
Reviewed-by: Neil Roberts <neil@linux.intel.com>
We've been using a mix of these two macros for a while now. Let's
just use the later everywhere. It seems to be the convention used
by other open-source projects.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
In some cases, glheader.h is the right #include.
Also remove some instances of struct _glapi_table declarations.
Acked-by: Matt Turner <mattst88@gmail.com>
It's unmaintained, and most likely broken: I use trace driver every now
and then, and everytime I do I need to fix it up.
It's also unused: identity_screen_create is never called.
Above all, it's dead weight: if identity driver had the infrastructure
for other pass-through drivers (like trace and rbug), then it would make
sense on its own right. But as it is implemmented, it's just another
driver to (forget) to update whenever there is a gallium interface
change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's a wrapper around emit_buffer_surface_state with format=RAW, pitch=1,
rw=true and the remaining arguments ordered differently. There's no point in
having a separate vtbl pointer for that.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to the bspec for some reason the format of the maximum
number of threads field has changed from U8-2 to U8-1 for the PS.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, we compared our new GS-out VUE map to the existing *VS*-out
VUE map, which is bogus.
This would mostly manifest as redundant dirty flagging where the GS is
in use but the VS and GS output layouts differ; but there is a scary
case where we would fail to flag a GS-out layout change if it happened
to match the VS-out layout.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.5, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88885
The use of the uninitialized_var() macro was to silence an uninitialized
variable warning that I assumed stemmed from gcc being unable to see
inside __get_cpuid() or understand its inline assembly.
In fact, it was because the __get_cpuid() function can fail, and not
initialize its arguments. Instead, check for failure and return early.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 79daa510c7.
I apparently hadn't done a clean build when testing this; it broke the
build for Tom, Ben, and myself. We like the idea; let's try a v2.
The length argument passed to sysctl was the size of the pointer
not the type. The result of this is sysctl calls would fail on
32 bit BSD/Mac OS X.
Additionally the wrong pointer was passed as an argument to store
the result of the sysctl call.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
_mesa_choose_tex_format (texformat.c) tries I8_SNORM, L8_SNORM, and
either L8A8_SNORM or A8L8_SNORM, none of which are supported by our
driver. Failing that, it falls back to RGBX for luminance, and RGBA
intensity and luminance alpha. So, we need to use swizzle overrrides
to obtain the correct values.
Fixes Piglit's EXT_texture_snorm/fbo-blending-formats and
fbo-clear-formats.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
CMP.Z doesn't work on Gen4-5 because the boolean isn't guaranteed to be
0 or 0xFFFFFFFF - only the low bit is defined.
We can call emit_bool_to_cond_code to generate the condition in f0.0;
the last instruction will generate the flag value. We can patch it to
use f0.1, and negate the condition.
Fixes discard tests on Gen4-5.
Haswell shader-db stats:
total instructions in shared programs: 5770279 -> 5769112 (-0.02%)
instructions in affected programs: 64342 -> 63175 (-1.81%)
helped: 1069
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.
This is achieved by using the appropriate -Werror= options only on the
places they need to be used.
Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies). I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is to enable the code to build with -Werror=vla in the short term,
and enable the code to build with MSVC2013 soon after.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix build error.
CC compiler/tests/r300_compiler_tests-radeon_compiler_regalloc_tests.o
compiler/tests/radeon_compiler_regalloc_tests.c: In function ‘test_runner_rc_regalloc’:
compiler/tests/radeon_compiler_regalloc_tests.c:57:3: error: implicit declaration of function ‘fprintf’ [-Werror=implicit-function-declaration]
fprintf(stderr, "Failed to load program\n");
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This fixes a dEQP test failure. In the test,
glCompressedTexSubImage2D was called with target = 0 and failed to throw
INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx,
target) being called before the target checking. To remedy this, target
checking was made into its own function and called prior to
_mesa_get_current_tex_object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89311
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes a dEQP test failure. In the test,
glCopyTexSubImage2D was called with target = 0 and failed to throw
INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx,
target) being called before the target checking. To remedy this, target
checking was separated from the main error-checking function and
called prior to _mesa_get_current_tex_object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89312
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
MSVC 2008 (shipped with Windows SDK 7.0.7600) is the oldest we
need to support. At least on llvmpipe, gallium/auxiliary, and util
modules. For the remaining modules (particular all OpenGL specific
code) can be built with MSVC 2013.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Not needed by anything in that header. Include math.h or c99_math.h
where needed instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Don't include stuff we don't need. Fix a few #includes elsewhere to
keep thing building.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be defined in math.h. If not, we can add them to c99_math.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A layered PBO image is now interpreted as a single tall 2D image so
the z argument in _mesa_meta_bind_fbo_image is ignored. Therefore this
was just redundantly rebinding the same image repeatedly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For 32-bit builds, floating point operations use x86 FPU registers,
not SSE registers. If we're actually storing an integer in a float
variable, the value might get modified when written to memory. This
patch changes the VBO code to use the fi_type (float/int union) to
store/copy vertex attributes.
Also, this can improve performance on x86 because moving floats with
integer registers instead of FP registers is faster.
Neil Roberts review:
- include changes on all places that are storing attribute values.
- check with and without -O3 compiler flag.
Brian Paul review:
- use fi_type type instead gl_constant_value type
- fix a bunch of nit-picks.
- fix compiler warnings
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82668
Signed-off-by: Marius Predut <marius.predut@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
before calling _mesa_meta_pbo_TexSubImage(). This will be used in
later patches and will be required in Skylake to get the tile
resource mode of miptree before calling _mesa_meta_pbo_TexSubImage().
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
No functional changes in the patch. Just makes the code look cleaner.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
create_texture_for_pbo() is shared by _mesa_meta_pbo_GetTexSubImage()
and _mesa_meta_pbo_TexSubImage() functions. So, we need to account
for both pack and unpack buffer objects.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
create_texture_for_pbo() is used by both _mesa_meta_pbo_GetTexSubImage()
and _mesa_meta_pbo_TexSubImage() functions with different PBO targets.
Use GL_STREAM_READ with GL_PIXEL_PACK_BUFFER and GL_STREAM_DRAW with
GL_PIXEL_UNPACK_BUFFER.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
There were some bugs, and the code was really difficult to follow. We
would optimize
min(max(x, b), 1.0) into max(sat(x), b)
but not pay attention to the order of min/max and also do
max(min(x, b), 1.0) into max(sat(x), b)
Corrects four shaders from Champions of Regnum that do
min(max(x, 1), 10)
and corrects rendering of Mass Effect under VMware Workstation.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89180
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since now ARRAY_SIZE has been added to util/macros.h. Fixes a bunch of:
freedreno_util.h:79:0: warning: "ARRAY_SIZE" redefined
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
^
In file included from ../../../../src/gallium/include/pipe/p_compiler.h:36:0,
from ../../../../src/gallium/include/pipe/p_context.h:31,
from freedreno_context.h:32,
from freedreno_context.c:29:
../../../../src/util/macros.h:29:0: note: this is the location of the previous definition
# define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
^
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sandybridge doesn't support y-tiling for surface formats with 16 or
more bpp. There was previously an override to explicitly allow this
for Gen7. However, this restriction is also removed in Gen8+ so we
should use y-tiling there too.
This is important to do for Skylake which doesn't support x-tiling for
3D surfaces.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
If the renderer supports the core profile the query returned incorrectly
0x8 as value, because it was using (1U << __DRI_API_OPENGL_CORE) for the
returned value.
The same happened with the compatibility profile. It returned 0x1
(1U << __DRI_API_OPENGL) instead of 0x2.
Internal DRI defines:
dri_interface.h: #define __DRI_API_OPENGL 0
dri_interface.h: #define __DRI_API_OPENGL_CORE 3
Those two bits are supposed for internal usage only and should be
translated to GLX_CONTEXT_CORE_PROFILE_BIT_ARB (0x1) for a preferred
core context profile and GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB (0x2)
for a preferred compatibility context profile.
This patch implements the above translation in the glx module.
v2: Fix the incorrect behavior in the glx module
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Doubles are always packed, but a single double will never cross a slot
boundary -- single slots can still be wasted in some situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Corrects the way that _mesa_meta_pbo_TexSubImage and
_mesa_meta_pbo_GetTexSubImage handle 1D_ARRAY textures. Fixes a failure in
the Piglit arb_direct_state_access/gettextureimage-targets test.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Changes PBO uploads and downloads to use a tall (height * depth) 2D texture
for blitting. This fixes the bug where 2D_ARRAY, 3D, and CUBE_MAP_ARRAY
textures are not properly uploaded and downloaded.
Removes the option to use a 2D ARRAY texture for the PBO during upload and
download. This option didn't work because the miptree couldn't be set up
reliably.
v2: Review from Jason Ekstrand and Neil Roberts:
-Delete the depth parameter from create_texture_for_pbo
-Abandon the option to create a 2D ARRAY texture in create_texture_for_pbo
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
This moves the line setting immutability for the texture to after
_mesa_initialize_texture_object so that the initializer function will not
cancel it out. Moreover, because of the ARB_texture_view extension, immutable
textures must have NumLayers > 0, or depth will equal (0-1)=0xFFFFFFFF during
SURFACE_STATE setup, which triggers assertions.
v2: Review from Kenneth Graunke:
- Include more explanation in the commit message.
- Make texture setup bug fixes into a separate patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
With the previous optimization in place, some shaders wind up with
multiple discard jumps in a row, or jumps directly to the next
instruction. We can remove those.
Without NIR on Haswell:
total instructions in shared programs: 5777258 -> 5775872 (-0.02%)
instructions in affected programs: 20312 -> 18926 (-6.82%)
helped: 716
With NIR on Haswell:
total instructions in shared programs: 5773163 -> 5771785 (-0.02%)
instructions in affected programs: 21040 -> 19662 (-6.55%)
helped: 717
v2: Use the CFG rather than the old instructions list. Presumably
the placeholder halt will be in the last basic block.
v3: Make sure placeholder_halt->prev isn't the head sentinel (caught
twice by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
st_glsl_to_tgsi and ir_to_mesa have handled conditional discards for a
long time; the previous patch added that capability to i965.
i965 (Haswell) shader-db stats:
Without NIR:
total instructions in shared programs: 5792133 -> 5776360 (-0.27%)
instructions in affected programs: 737585 -> 721812 (-2.14%)
helped: 6300
HURT: 68
GAINED: 2
With NIR:
total instructions in shared programs: 5787538 -> 5769569 (-0.31%)
instructions in affected programs: 767843 -> 749874 (-2.34%)
helped: 6522
HURT: 35
GAINED: 6
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The discard condition tells us which channels we want killed. We want
to invert that condition to get the channels that should survive (remain
live) in f0.1. Emit a CMP to negate it.
Nothing generates these today, but that will change shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a conditional discard, which takes a boolean source.
Note that we don't generate ir_discard::condition today, so this
shouldn't break drivers (since none implement this intrinsic yet).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
opt_constant_folding() already detects conditional assignments where the
condition is constant, and either deletes the assignment or the
condition.
Make it handle discards in the same fashion.
Spotted happening in the wild in Tropico 5 shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This pass wasn't prepared to handle conditional discards.
Instead of initializing the "discarded" temporary to "true", set it to
the condition. Then, refer to the variable for the condition, to avoid
duplicating the expression tree.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen8+ support was just broken, since MUL now consumes 32-bits from both
sources. Fixes 986 piglit tests on my BDW.
total instructions in shared programs: 7753873 -> 7753522 (-0.00%)
instructions in affected programs: 28164 -> 27813 (-1.25%)
helped: 77
GAINED: 47
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds a parent_instr field similar to the one for ssa_def. The
difference here is that the parent_instr field on a nir_register can be
NULL if the register does not have a unique definition or if that
definition does not dominate all its uses. We set this field in the
out-of-SSA pass so that backends can get SSA-like information even after
they have gone out of SSA.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
EVENT_TYPE_PIPELINESTAT_STOP disables streamout queries too.
Luckily, pipeline stats are enabled by default, so we don't even have to
emit EVENT_TYPE_PIPELINESTAT_START.
Tested on Hawaii, Bonaire, Redwood, RV730.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- remove the last parameter of si_emit_rasterizer_prim_state
- remove the last unused parameter of si_emit_draw_registers
- use current_rast_prim in si_emit_draw_registers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Mostly dead code or code that didn't do anything.
Computing gs_num_outputs at the end was also useless. It's already set
correctly.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Requires Evergreen/Cayman and radeon kernel module
2.41.0 or newer.
Expected piglit fails due to hardware limitations:
* arb_draw_indirect-draw-arrays-prim-restart
Restarts not applied for DrawArrays commands
* arb_draw_indirect-vertexid
Base vertex offset is not included in vertex id
Marek: bump vgt_state num_dw by 3 (= space needed for one register write)
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
poc counter should be reset with IDR frame,
otherwise there would be a re-order issue with
frames before and after IDR
v2: add commit message
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
With earlier commit (install-lib-links: don't depend on .libs directory)
we moved the location of the file from .libs/ to the current dir.
Although we did not attribute that in the former case autotools was
doing us a favour and removing the file. Explicitly remove the file at
clean-local time, otherwise we'll end up with dangling files.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
According to the spec when no device access mode is specified
clCreateBuffer and clCreateImage* should default to read/write, and
clCreateSubBuffer should default to the parent's device access flags.
clCreateSubBuffer is also required to inherit the host access and
host pointer flags from the parent.
Reviewed-and-tested-by: EdB <edb+mesa@sigluy.net>
Those flags have been introduced in OpenCL 1.2.
[ Francisco Jerez: Rebase. Throw CL_INVALID_VALUE from
clCreateSubBuffer if the subbuffer drops access flags from its
parent. Use single function taking the set of allowed host access
flags to validate memory transfer operands. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
New BO create and mmap ioctls are added. The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit. Shaders are now fixed at
the start of their BOs.
The most common macros are defined there, no use to duplicate these
Clean up the already redefinded macros
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
"(...)Let w be the width rounded to the nearest integer (...). If the
line segment has endpoints given by (x0,y0) and (x1,y1) in window
coordinates, the segment with endpoints (x0,y0-(w-1)/2) and
(x1,y1-(w-1/2)) is rasterized, (...)"
The hardware it not rounding the line width, so we should do it.
Also, we should be careful not to go beyond the hardware limits
for the line width after it gets rounded. Gen6-7 define a maximum line
width slightly below 8.0, so we should advertise a maximum line
width lower than 7.5 to make sure that 7.0 is the maximum integer
line width that we can select. Since the line width granularity in these
platforms is 0.125, we choose 7.375. Other platforms advertise rounded
maximum line widths, so those are fine.
Fixes the following 3 dEQP tests:
dEQP-GLES3.functional.rasterization.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.lines_wide
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The intel driver code, and apparently all other Mesa drivers, call
_mesa_initialize_context early in the CreateContext hook. That
function will end up calling _mesa_init_texture which will do:
ctx->Texture.CubeMapSeamless = _mesa_is_gles3(ctx);
But this won't work at this point, since _mesa_is_gles3 requires
ctx->Version to be set and that will not happen until late
in the CreateContext hook, when _mesa_compute_version is called.
We can't just move the call to _mesa_compute_version before
_mesa_initialize_context since it needs that available extensions
have been computed, which again requires other things to be
initialized, etc. Instead, we enable seamless cube maps since
GLES2, which should work for most implementations, and expect
drivers that don't support this to disable it manually as part
of their context initialization setup.
Fixes the following 192 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.formats.*
dEQP-GLES3.functional.texture.filtering.cube.sizes.*
dEQP-GLES3.functional.texture.filtering.cube.combinations.*
dEQP-GLES3.functional.texture.mipmap.cube.*
dEQP-GLES3.functional.texture.vertex.cube.filtering.*
dEQP-GLES3.functional.texture.vertex.cube.wrap.*
dEQP-GLES3.functional.shaders.texture_functions.texturelod.samplercube_fixed_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4
spec states:
"BeginQuery generates an INVALID_OPERATION error if any of the
following conditions hold: [...] id is the name of an
existing query object whose type does not match target; [...]
Similar wording exists in the OpenGL 4.5 spec, section 4.2. QUERY
OBJECTS AND ASYNCHRONOUS QUERIES, page 43.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.fragment.begin_query
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4 states:
"The command void GenQueries( sizei n, uint *ids ); returns n previously unused
query object names in ids. These names are marked as used, for the purposes of
GenQueries only, but no object is associated with them until the first time they
are used by BeginQuery."
This means that any attempt to use or query a Query object id before it has ever
been bound by calling glBeginQuery, should be assume to be an invalid object.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_query_objectuiv
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The zoffset and depth values were not being considered when calling
error_check_subtexture_dimensions().
Fixes 2 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_offset
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_invalid_offset
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedestkop.org>
Across the board of the various generations, the intial few atoms in
all of the atom lists are basically the same, (performing uploads for
the various programs). The only difference is that prior to gen6
there's an ff_gs upload in place of the later gs upload.
In this commit, instead of using the atom lists for this program state
upload, we add a new function brw_upload_programs that calls into the
per-stage upload functions which in turn check dirty bits and return
immediately if nothing needs to be done.
This commit is intended to have no functional change. The motivation
is that future code, (such as the shader cache), wants to have a
single function within which to perform various operations before and
after program upload, (with some local variables holding state across
the upload).
It may be worth looking at whether some of the other functionality
currently handled via atoms might also be more cleanly handled in a
similar fashion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In current code, color format is always hardcoded to
__DRI_IMAGE_FORMAT_ARGB8888 when buffer or DRI image is
allocated in function calls, get_back_bo and dri2_get_buffers,
regardless of current target's color format. This problem
may leads to incorrect render pitch calculation, which
eventually ends up with wrong offset of pixels in
the frame buffer when the image is in different color format
from dri surf's, especially with different bpp. (e.g. RGB565-16bpp)
Attached code patch simply adds RGB565 and XRGB8888 cases to two
functions noted above to resolve the issue.
v2: added a case of XRGB8888, format and bpp selection is done
via switch-case (not "if-else" anymore)
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
The alternative would be to include math.h in c99_compat.h but that
seems heavy-handed.
This patch also replaces INLINE with inline in the c99 math function
wrappers.
Fixes MSVC build.
Acked-by: Matt Turner <mattst88@gmail.com>
We were already do this for ALU operations but we haven't for non-ALU
operations. This changes that.
total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%)
NIR instructions in affected programs: 1768850 -> 1751305 (-0.99%)
helped: 14244
HURT: 124
total FS instructions in shared programs: 4083960 -> 4084036 (0.00%)
FS instructions in affected programs: 7302 -> 7378 (1.04%)
helped: 12
HURT: 51
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around. On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the number of equally-colored nearby nodes, what increases
the likelihood of failure in presence of optimistically colorable
nodes.
This patch disables the round-robin strategy for optimistically
colorable nodes. These typically arise in situations of high register
pressure or for registers with large live intervals, in both cases the
task of the instruction scheduler shouldn't be constrained excessively
by the dense packing of those nodes, and a spill (or on Intel hardware
a fall-back to SIMD8 mode) is invariably worse than a slightly less
optimal scheduling.
Shader-db results on the i965 driver:
total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
instructions in affected programs: 1121 -> 1071 (-4.46%)
helped: 1
HURT: 0
GAINED: 49
LOST: 5
v2: Re-enable round-robin already for the lowest one of the nodes
pushed optimistically onto the sack (Connor).
v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor).
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Fixes metadata guess when instructions in the program specify a
destination register with non-zero reg_offset and when the payload of
a LOAD_PAYLOAD spans several registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It's perfectly fine to read the second half of a register written with
force_writemask_all from a first half MOV instruction or vice versa, and
lower_load_payload shouldn't mark the whole MOV as belonging to the second
half in that case. Replicate the same metadata to both halves of the
destination when writemasking is disabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There's some commentary about how it's defined by other "modules", and
maybe that was true in 2000 when the code was added.
Reviewed-by: Eric Anholt <eric@anholt.net>
... and util_le{16,32}_to_cpu. I think I've used the right ones for
describing the actual operation performed (even though they're both just
"byte-swap this if I'm on big-endian").
The Linux Kernel has typedefs __le32/__be32 and friends that static
analysis tools can use to check that byte-orderings are correct. It
might be interesting to apply that here as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This corrects a trivial error introduced in commit
19252fee46. That patch was merged recently
and omits one condition (that 'samples' is greater than zero) in one of
the error checks. That error will definitely cause regressions.
Also corrects the reference to the specification above the error check,
which was wrongly quoting OpenGL instead of OpenGL-ES.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful. In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.
No shader-db effects.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We have some useful optimizations to drop things like 'ine a, 0' on a
boolean argument, but if 'a' came from logical operations on bools, it
couldn't tell. These kinds of constructs appear as a result of TGSI->NIR
quite frequently (at least with if flattening), so being a little more
aggressive in detecting booleans can pay off.
v2: Add ixor as a booleanness-preserving op (Suggestion by Connor).
vc4 results:
total instructions in shared programs: 40207 -> 39881 (-0.81%)
instructions in affected programs: 6677 -> 6351 (-4.88%)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
If there is no pci-id, which is valid for vc4 and freedreno, just emit
an info msg. Keep malformed but existing pci-id's as a warning.
Mostly just to clean up a warning that confuses users for the non-pci
devices.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I never actually implemented the stubbed out fence stuff back in the
early days. Fix that.
We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The brw_imm_ud will yield a HW_REG which then will introduce a barrier
for certain optimization opportunities.
No piglit regressions seen with gen8 (simd8vs).
Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For fragment programs, we pull this mask from the payload header. The same
mask doesn't exist for compute shaders, so we set all bits to enabled.
Previously we were setting 0xff to support SIMD8 VS, but with CS we
support SIMD16, and therefore we change this to 0xffff.
Related commits for SIMD8 VS:
commit d9cd982d55
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Sun Feb 15 20:06:59 2015 -0800
i965/simd8vs: Fix SIMD8 atomics
commit 4a95be9772
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Tue Feb 17 09:57:35 2015 -0800
i965/simd8vs: Fix SIMD8 atomics (read-only)
Note: this mask is ANDed with the execution mask, so some channels may not end
up issuing the atomic operation.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since dropping some NV_fragment_program opcodes (commits
868f95f1da, a3688d686f)
we can no longer parse all opcodes necessary for this extension, leading
to bugs (https://bugs.freedesktop.org/show_bug.cgi?id=86980).
Hence don't announce support for it in swrast (no other driver enabled it).
(Note that remnants of some NV_fp/vp extensions remain, they could be
dropped but are required as hacks for getting viewperf11 catia to run.)
mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not
defined. Confusing and bizarre that you don't get NDEBUG if you don't
include mtypes.h.
... which is just what happened in commit bef38f62e.
Let's let configure define this for us if not using --enable-debug.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which
doesn't exist, breaking the build:
assert(state->states[index].index < state->states[index].stack_size);
Switch it to ifndef NDEBUG, so the field will exist if the assertion
actually generates code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
One less new directory necessary for gallium code that wants to interact
with NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Note that we can't use u_math.h's align() because it's a function instead
of a macro, while BITSET_DECLARE needs a constant expression for nouveau's
usage in global declarations.
v2: Stick some parens around the bits macro argument usage (review by Jose).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This avoids duplication of some macros and other definitions across the
tree.
Note that COPY_4FV switches from a memcpy-based implementation to an
assignment of 4 floats.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It introduces references to gallium util/ symbols which means we don't get
to include it from outside-of-gallium code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Missed a few drivers in the earlier changes, this should fix up all the
ones that print unknown caps or don't have a default statement.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In gen6 we need to compute the primitive count in the generated GS program.
The current implementation only counts full primitives, that is, if the
output primitive type is a triangle strip, it won't count individual
triangles in the strip, only complete strips.
If we want to count basic primitives instead we have two options: rework
the assembly code we generate for strip primitives or simply use
CL_INVOCATION_COUNT to resolve the query and let the hardware do that work
for us. This patch implements the latter approach.
Fixes the following piglit test:
bin/arb_pipeline_statistics_query-geom -auto
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89210
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Section 4.2 (Whole Framebuffer Operations) of the OpenGL 3.0 specification
says:
"Each buffer listed in bufs must be BACK, NONE, or one of the values from
table 4.3 (NONE, COLOR_ATTACHMENTi)".
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.draw_buffers
Reviewed-by: Matt Turner <mattst88@gmail.com>
GLSL 1.50 and GLSL 4.40 specs, they both say the same in
"Interface Blocks" section:
"If optional qualifiers are used, they can include interpolation qualifiers,
auxiliary storage qualifiers, and storage qualifiers and they must declare
an input, output, or uniform member consistent with the interface qualifier
of the block"
From GLSL ES 3.0, chapter 4.3.7 "Interface Blocks", page 38:
"GLSL ES 3.0 does not support interface blocks for shader inputs or outputs."
and from GLSL ES 3.0, chapter 4.6.1 "The invariant qualifier", page 52.
"Only variables output from a shader can be candidates for invariance."
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
v2:
- Enable this check for GLSL.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.
total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs: 1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs: 6027 -> 5848 (-2.97%)
I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
Each emit_cond_mov() emits a CMP of its first to arguments using the
specified conditional mod, followed by a predicated MOV of the fifth
argument into the fourth. In all four cases here, it was just
implementing MIN/MAX which we can do in a single SEL instruction.
Also reorder the instructions for a slightly better schedule.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things, SEL
with these conditional mods are commutative.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were special casing OPCODE_END but no other instructions that have no
destination, like OPCODE_KIL, leading us to emitting MOVs with null
destinations.
total instructions in shared programs: 5705243 -> 5701539 (-0.06%)
instructions in affected programs: 124104 -> 120400 (-2.98%)
helped: 904
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The saturate propagation pass recognizes that the second instruction
below does not interfere with an attempt to propagate the saturate
modifier from instruction 3 to 1.
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 dst0
3: mov.sat(8) dst2 dst0
Unfortunately, we did not consider the case of instruction 2 having a
source modifier on dst0. Take for instance:
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 -dst0
3: mov.sat(8) dst2 dst0
Consider such an instruction to interfere. Increase instruction counts
in Anomaly 2, which could be a bug fix depending on the values the first
instruction produces.
instructions in affected programs: 53228 -> 53934 (1.33%)
HURT: 360
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This lets us be slightly more efficient by not walking the CFG extra times.
Also, it may make it easier to ensure that GVN happens on only unpinned
instructions.
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use nir_dominance_lca for computing least common anscestors
- Use the block index for comparing dominance tree depths
- Pin things that do partial derivatives
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Right now, the nir_instr_prev function function blindly looks up the
previous element in the exec list and casts it to an instruction even if
it's the tail sentinel. The next commit will change this to return null if
it's the first instruction. Making this change first avoids getting a
segfault between commits. The only reason we never noticed is that, thanks
to the way things are laid out in nir_block, the casted instruction's type
was never parallal_copy.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, if you remved a CF node that still had instructions in it, none
of the use/def information from those instructions would get cleaned up.
Also, we weren't removing if statements from the if_uses of the
corresponding register or SSA def. This commit fixes both of these
problems
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is mostly thanks to Connor. The idea is to do a depth-first search
that computes pre and post indices for all the blocks. We can then figure
out if one block dominates another in constant time by two simple
comparison operations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Being able to find the least common anscestor in the dominance tree is a
useful thing that we may want to do in other passes. In particular, we
need it for GCM.
v2: Handle NULL inputs by returning the other block
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This adds support to the state tracker for
ARB_gpu_shader_fp64.
The details are explained in comments
within the code.
v2 : add double to int/unsigned conversion
v3: handle fp64 consts better
v4: use DRSQ
v4.1: add d2b
v4.2: drop DDIV
v5: split out some prep patches.
v5.1: add some comments.
v5.2: more comments
v6: simplify down the double instruction
generation loop.
v7: Merge Ilia's two cleanup patches.
v7.1: minor fixups for Ilia patch + cleanups
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves stuff around a little to make the next patch
cleaner.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for fp64 support where we need
an array of 2 dst values.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just fills in some blanks to avoid warnings in the i965 driver.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is basically Ian's review feedback for my patch that added
_mesa_shader_stage_to_abbrev() - it just makes both consistent again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the vec4 backend labeled shaders as "vec4" - now it uses the
specific names "VS" and "GS".
The FS backend now correctly prints "VS" for vertex shaders (rather than
"fs"). It also prints "FS" instead of "fs" for fragment shaders;
preserving that behavior didn't seem essential.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We introduce three new fields in backend_visitor:
- debug_enabled: whether or not INTEL_DEBUG & DEBUG_<stage flag>
- stage_name: "vertex", "fragment", etc. for use in messages
- stage_abbrev: "VS", "FS", etc. for use in messages
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When compiling, we have a gl_shader_stage (MESA_SHADER_*) enum, and want
to know whether debugging is enabled for that stage. This allows us to
easily translate it into the corresponding debug flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Comparing the location field is equivalent and more efficient.
We'll also need this when we start using NIR for ARB programs, as our
NIR converter will set the location field correctly, but probably won't
use the GLSL names for these concepts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It looks like no hw does div anyways, so we should just
lower at the GLSL level.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These act like flt32 except they take up two slots, and you
can only add 2 x flt64 constants in one slot.
The main reason they are different is we don't want to match half a flt64
constants against a flt32 constant in the matching code, we need to make
sure we treat both parts of the flt64 as an single structure.
Cleaned up printing/parsing by Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds support for a set of double opcodes
to TGSI. It is an update of work done originally
by Michal Krol on the gallium-double-opcodes branch.
The opcodes have a hint where they came from in the
header file.
v2: add unsigned/int <-> double
v2.1: update docs.
v3: add DRSQ (Glenn), fix review comments (Glenn).
v4: drop DDIV
v4.1: cleanups, fix some docs bugs, (Ilia)
rework store_dest and fetch_source fns. (Ilia)
4.2: fixup float comparisons (Ilia)
This is based on code by Michael Krol <michal@vmware.com>
Roland and Glenn also reviewed earlier versions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To silence compiler warnings about unhandled switch cases.
v2: move GSL_TYPE_DOUBLE case to the "Invalid type in type_size" section,
per Ilia.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To silence compiler warning about unhandled switch case.
v2: move GLSL_TYPE_DOUBLE to the "not reached" section, per Ilia.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use pipe_sampler_view_reference() instead of ordinary assignment.
Also add a new sanity check assertion.
Fixes piglit gl-1.0-drawpixels-color-index test crash. But note
that the test still fails.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This snippet can be included in Makefiles that may, depending on the
project configuration, not actually build any installable libraries.
In that case we don't have anything to depend on and this part of
the makefile may be executed before the .libs directory is created,
so do not depend on it being there.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This fixes a regression in the running time of Piglit introduced by
commit 78e9043475, which increased the
number of register allocation classes set up by the VEC4 back-end
from 2 to 16. The algorithm used by ra_set_finalize() to calculate
them is unnecessarily expensive, do it manually like the FS back-end
does.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some instruction bits don't have a mapping defined to any compacted
instruction field. If they're ever set and we end up compacting the
instruction they will be forced to zero. Avoid using compaction in such
cases.
v2: Align multiple lines of an expression to the same column. Change
conditional compaction of 3-source instructions to an
assertion. (Matt)
v3: The 3-source instruction bit 105 is part of SourceIndex on CHV.
Add assertion that reserved bit 7 is not set. (Matt)
Document overlap with UIP and 64-bit immediate fields.
v4: Make some more unmapped bit checks assertions. (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Due to the way it's implemented in hardware, the F16TO32/F32TO16
instructions require the source/destination register to be of some
16-bit type in Align1 mode, while they require it to be some 32-bit
type in Align16 mode (and as an undocumented feature the high 16 bits
of the destination register are zeroed out in the case of the F32TO16
instruction on Gen7). Make their behaviour consistent so you can
specify a 32 bit register type as source or destination and get
predictable results in the most significant bits no matter what access
mode is being used.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We cannot zero out the destination register if it overlaps with the
source. Use an Align1 instruction instead to zero out the high 16
bits after the conversion to half float.
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the source type differs from the original type of the constant we
need to bit-cast it before propagating, otherwise the original type
information will be lost. If the constant was a vector float there
isn't much we can do, because the result of bit-casting the component
values of a vector float cannot itself be represented as an immediate.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Create a new search function to look for matching built-in functions by name
and use it for built-in function redefinition or overload in GLSL ES 3.00.
GLSL ES 3.0 spec, chapter 6.1 "Function Definitions", page 71
"A shader cannot redefine or overload built-in functions."
While in GLSL ES 1.0 specification, chapter 8 "Built-in Functions"
"User code can overload the built-in functions but cannot redefine them."
So this check is specific to GLSL ES 3.00.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_fragment
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Per GLES3 specification, section 4.4 Framebuffer objects page 198, "If
internalformat is a signed or unsigned integer format and samples is greater
than zero, then the error INVALID_OPERATION is generated.".
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.renderbuffer_storage_multisample
Reviewed-by: Matt Turner <mattst88@gmail.com>
'3.8.2 Sampler Objects' section of the GL-ES 3.0 specification states:
"An INVALID_OPERATION error is generated if sampler is not the name
of a sampler object previously returned from a call to GenSamplers."
In desktop GL, an GL_INVALID_VALUE is returned instead.
Fixes 6 dEQP failing tests:
* dEQP-GLES3.functional.negative_api.shader.get_sampler_parameteriv
* dEQP-GLES3.functional.negative_api.shader.get_sampler_parameterfv
* dEQP-GLES3.functional.negative_api.shader.sampler_parameteri
* dEQP-GLES3.functional.negative_api.shader.sampler_parameteriv
* dEQP-GLES3.functional.negative_api.shader.sampler_parameterf
* dEQP-GLES3.functional.negative_api.shader.sampler_parameterfv
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Rebase on the nir_opcodes.h python code generation support.
v3: Use SSA values, and set an appropriate writemask on dot products.
v4: Make the arguments be SSA references as well. This lets you stack up
expressions in the arguments of other expressions, at the cost of
having to insert a fmov/imov if you want to swizzle. Also, add
the generated file to NIR_GENERATED_FILES.
v5: Use more pythonish style for iterating the list.
v6: Infer the size of the dest from the size of the srcs, and auto-swizzle
a single small src out to the appropriate size.
v7: Add little helpers for initializing the struct, add a typedef for the
struct like other nir types have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v6)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v7)
These lowering passes are optional for the backend to request, currently
the TGSI softpipe backend most likely the r600g backend would want to use
these passes as is. They aim to hit the gallium opcodes from the standard
rounding/truncation functions.
v2: also lower floor in mod_to_floor
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This implements the bulk of the builtin functions for fp64 support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to restrict some lowering passes to floats only,
and enable other for doubles.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Patch fixes Piglit test:
arb_gpu_shader_fp64/preprocessor/fs-output-double.frag
and adds additional validation for shader outputs.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a enter/leave record callback so that the offset may be aligned to
the proper value. Otherwise only leaf fields are called, and the first
field needs to be aligned to the outer struct's base alignment while the
last field needs to be aligned to the inner struct's base alignment.
This removes most usage of the last field/record type values passed into
visit_field.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
These functions are about to be used more aggressively for determining
uniform layout. Samplers may be inside of structs, and it's easier to
reuse the existing base alignment logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
v2: add define bit (Tapani Pälli)
Patch makes following Piglit tests pass:
arb_gpu_shader_fp64/preprocessor/define.vert
arb_gpu_shader_fp64/preprocessor/define.frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds support for the new uniform interfaces
from ARB_gpu_shader_fp64.
v2:
support ARB_separate_shader_objects ProgramUniform*d* (Ian)
don't allow boolean uniforms to be updated (issue 15) (Ian)
v3: fix size_mul
v4: Teach uniform update to take into account double precision (Topi)
v5: add transpose for double case (Ilia)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just add the xml file covering this extension,
and dummy interface files in mesa, and fix up
sanity tests.
v2:
Enable ProgramUniform*d* from ARB_separate_shader_objects (Ian)
use 40 instead of 43 for dispatch_sanity.cpp (Chris)
uncomment PU sanity tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the driver actually supports ETC2, don't decode it in software.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
No actual decoding is added, similar faking mechanism to bptc.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This could be done in a separate pass like we do in GLSL IR, but it seems
to me like having the definitions of the transformations in the two
directions next to each other makes a lot of sense.
v2: Reorder the comment about the transformation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic
and other passes should try and generate certain types of opcodes or
patterns. Extend that to NIR by defining our own struct, which is
automatically generated from the Mesa struct in glsl_to_nir and provided
directly by the driver in TGSI-to-NIR.
v2: Split out the previous two prep patches.
v3: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will be used so that we can customize the transforms for the target
GPU, so we don't un-lower expressions that had already been lowered (or
introduce new lowering transformations that not all GPUs want)
v2: Drop the complication of having the condition->index dictionary, since
we don't actually expect there to be many different conditions (change
by Kenneth).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to give the optimization passes a chance to customize
behavior for the particular target device.
v2: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
An update for d9cd982d55.
A similar change was needed for CS to allow the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass.
The previous change (d9cd982d) should fix cases that write atomics,
such as atomicCounterIncrement, and this change will fix cases than
only read atomics, such as atomicCounter.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With commit c39dbfdd0f7(auxiliary/vl: bring back the VL code for the dri
targets) we did not fully consider users of dri-swrast alone. Thus we
ended up trying to compile the dri2 specific code on platform which lack
it - Cygwin for example.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reported-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Currently we use DISTCHECK_CONFIGURE_FLAGS, which is reserved for
the user. As with other variables, one should use the AM_ variable
within the makefile.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The XExtensionInfo is allocated dynamically (if the pointer is NULL)
in the XEXT_GENERATE_FIND_DISPLAY macro. On the other hand the
macro XEXT_GENERATE_CLOSE_DISPLAY does not check/free the memory.
Follow the example set by dri1 and appledri, and use a static variable.
Spotted while hunting "still reachable" leaks in Waffle.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
NOTE: The implementation was initially one patch, this. All the history is kept
here, even though all the core mesa changes were moved to the parent of this
patch.
This patch implements ARB_pipeline_statistics_query. This addition to GL does
not add a new API. Instead, it adds new tokens to the existing query APIs. The
work to hook up the new tokens is trivial due to it's similarity to the previous
work done for the query APIs. I've implemented all the new tokens to some
degree, but have stubbed out the untested ones at the entry point for Begin().
Doing this should allow the remainder of the code to be left in.
The new tokens give GL clients a way to obtain stats about the GL pipeline.
Generally, you get the number of things going in, invocations, and number of
things coming out, primitives, of the various stages. There are two immediate
uses for this, performance information, and debugging various types of
misrendering. I doubt one can use these for debugging very complex applications,
but for piglit tests, it should be quite useful.
Tessellation shaders, and compute shaders are not addressed in this patch
because there is no upstream implementation. I've implemented how I believe
tessellation shader stats will work for Intel hardware (though there is a bit of
ambiguity). Compute shaders are a bit more interesting though, and I don't yet
know what we'll do there.
For the lazy, here is a link to the relevant part of the spec:
https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt
Running the piglit tests
http://lists.freedesktop.org/archives/piglit/2014-November/013321.html
(http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats)
yield the following results:
> piglit-run.py -t stats tests/all.py output/pipeline_stats
> [5/5] pass: 5 Running Test(s): 5
v2:
- Don't allow pipeline_stats to be per stream (Ilia). This may (not sure) be
needed for AMD_transform_feedback4, which we do not support.
> If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_-
> EMITTED_ARB counts primitives emitted to any of the vertex streams for
> which STREAM_RASTERIZATION_AMD is enabled.
- Remove comment from GL3.txt because it is only used for extensions that are
part of required versions (Ilia)
- Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia)
- Add a fallthrough comment (Ilia)
- Only divide PS invocations by 4 on HSW+ (Ben)
v3:
- Add ARB_pipeline_statistics_query to relnotes.html
- Add ARB_pipeline_statistics_query.xml to the Makefile.am, and master XML (Ilia)
- Correct extension number (Ilia)
- Add link to xml in the main GL API xml (Ilia)
- remove special GS case from gen6_end_query (Ian)
- Make lookup table static so gcc doesn't initialized it on every call (Ian)
- Use if (_mesa_has_geometry_shaders(ctx)) instead of explicit checks (Ian)
- Core mesa parts moved into a prep patch (Ilia)
v4:
- Change to 10.6 relnotes since we missed 10.5 window
- Moved compute shader stuff into the switch statement (Jordan)
- Jordan: Add compute shader support
v5:
- Fixed relnote style (Ilia)
v6:
- Rebased on master which beat me to adding the first relnotes - essentially
this undoes v5 (which had a typo anyway)
- Some code style fixes (Ken)
- Remove some excess comments (Ken)
- Unify tessellation failure style - unreachable (Ken)
- Fix workaround comment for PS invocations (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was originally part of a single patch which added the extension, and
implemented it for i965 classic. For information about the evolution of the
patch, please see the subsequent commit.
One difference here as compared to the original mega patch is this does build
support for the compute shader query. Since it cannot be tested on any platform,
it will always return NULL for now. Jordan has already written a patch to
address this, and when that patch lands, this logic can be modified.
v2: Fix typo in subject (Brian Paul)
Add checks for desktop gl (Ilia)
Fail for any callers for now (Ilia)
Update QueryCounterBits for new tokens (Ilia)
Jordan: Use _mesa_has_compute_shaders
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v3: Rebased on patch which adds the proper information to unstub tessellation
shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's some debate about whether we should use Meta or BLORP,
but either should run circles around the BLT engine.
In particular, this means that Gen8+ will use the 3D engine for blits,
like we do on Gen6-7.
Improves performance in "copypixrate -blit -back" (from Mesa demos)
by 232.037% +/- 3.15795% (n=10) on Broadwell GT3e.
v2: Rebase on Laura's changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Previously we didn't emit MAD instructions since they cannot take
immediate arguments, but with the opt_combine_constants() pass we can
handle this properly.
total instructions in shared programs: 5920017 -> 5733278 (-3.15%)
instructions in affected programs: 3625153 -> 3438414 (-5.15%)
helped: 22017
HURT: 870
GAINED: 91
LOST: 49
Without constant pooling, this patch is a complete loss:
total instructions in shared programs: 5912589 -> 5987888 (1.27%)
instructions in affected programs: 3190050 -> 3265349 (2.36%)
helped: 1564
HURT: 17827
GAINED: 27
LOST: 101
And since the constant pooling patch by itself hurt a bunch of things,
from before constant pooling to this patch the results are:
total instructions in shared programs: 5895414 -> 5747946 (-2.50%)
instructions in affected programs: 3617993 -> 3470525 (-4.08%)
helped: 20478
HURT: 4469
GAINED: 54
LOST: 146
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And then the opt_combine_constants() pass will pull them out into
registers. This will allow us to do some algebraic optimizations on MAD
and LRP.
total instructions in shared programs: 5946656 -> 5931320 (-0.26%)
instructions in affected programs: 778247 -> 762911 (-1.97%)
helped: 3780
HURT: 6
GAINED: 12
LOST: 12
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The fs_visitor's dump_instruction() implementation calls cfg_t()
indirectly through calculate_live_intervals, so if you have an infinite
loop in the CFG code, you can't call cfg::dump(fs_visitor *) to debug
it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To insert an instruction at the end of a basic block, we typically do
something like
inst = block->last_non_control_flow_inst();
inst->insert_after(block, new_inst);
But blocks can consist of a single control flow instruction, so inst
will actually be the exec_list's head sentinel. We shouldn't use it as
if it were a regular instruction, but it is safe to insert something after
it.
This patch avoids assert-failing because an exec_list sentinel wasn't in
the basic block's instruction list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The macro is defined to provide a trailing ; so this caused the expansion
to end in ";;" which made the Solaris Studio compilers issue warnings for
every line of:
"builtin_type_macros.h", line 113: Warning: extra ";" ignored.
for every file that included the header, filling build logs with thousands
of useless warnings.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
opt_copy_propagation and opt_copy_propagation_elements create new ACP
and Kill sets each time they enter a new control flow block. For if
blocks, they also copy the entire existing ACP set contents into the
new set.
When we exit the control flow block, we discard the new sets. However,
we weren't freeing them - so they lived on until the pass finished.
This can waste a lot of memory (57MB on one pessimal shader).
This patch makes the pass allocate ACP entries using this->acp as the
memory context, and Kill entries out of this->kill. It also steals
kill entries when moving them from the inner kill list to the parent.
It then frees the lists, including their contents.
v2: Move ralloc_free(this->acp) just before this->acp = orig_acp
(suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.5 10.4" <mesa-stable@lists.freedesktop.org>
When we schedule an instructions with undefined value, we
eventually will use 0, which is a constant, however sb wasn't
taking this into account and creating ops with illegal scalar
swizzles.
this replaces my fix for op3 in t slots.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Matt Turner noticed that the hardware has always had a MIN
instruction, but the driver always used MAX+MOV for no
apparent reason.
This should cut an instruction, and a temporary, allowing
more programs to run in hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt Turner noticed that the hardware has always had a MIN
instruction, but the driver always used MAX+MOV for no
apparent reason.
This should cut an instruction, and a temporary, allowing
more programs to run in hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I used this a while back when debugging GPU hangs, and it seems like it
could be useful, so I figured I'd add it so people can use it in the
debugger.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sandybridge requires the post-sync non-zero workaround in a ton of
places, and if you ever miss one, the GPU usually hangs.
Currently, we try to track exactly when a workaround flush is
necessary (via the brw->batch.need_workaround_flush flag). This is
tricky to get right, and we've botched it several times in the past.
This patch unconditionally performs the post-sync non-zero flush at the
start of each primitive's state upload (including BLORP). We drop the
needs_workaround_flush flag, and drop all the other callers, as the
flush has already been performed.
We have no data to indicate that simply flushing all the time will
hurt performance, and it has the potential to help stability.
v2: Add post-sync workaround to initial GPU state upload to be extra
cautious (suggested by Chad Versace).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously array textures were not working with GetCompressedTextureImage,
leading to failures in the test
arb_direct_state_access/getcompressedtextureimage.c.
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Just remove the _mesa_free_lighting_data function. The body has been
empty since the shine table was moved into the tnl module (commit
ba1d921).
main/light.c:1216:46: warning: unused parameter 'ctx' [-Wunused-parameter]
_mesa_free_lighting_data( struct gl_context *ctx )
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
delete_management.c:56:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
delete_management.c:69:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = size - 100; i < size; i++) {
^
delete_management.c:79:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(key_value(entry->key) >= size - 100 &&
^
delete_management.c:79:70: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(key_value(entry->key) >= size - 100 &&
^
insert_many.c:56:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
insert_many.c:62:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
insert_many.c:67:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(ht->entries == size);
^
random_entry.c:62:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
glcpp/glcpp.c:124:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
const static struct option
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
+ minor indentation fixes
Discovered by Axel Davy.
This can't be reproduced with any app, because all state trackers set a DSA
state first.
Cc: 10.5 10.4 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
It's not possible to query the current buffer binding, because the extension
doesn't define GL_..._BUFFER__BINDING_AMD.
Drivers should check the target parameter of Drivers.BufferData. If it's
equal to GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD, the memory should be pinned.
That's all there is to it.
A piglit test is on the piglit mailing list.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
everytime I open this file in emacs with show trailing whitespace
or git add from it my screen flares with red.
Just do a general cleanup, makes working on fp64 support not as
jarring.
I'm not saying this is perfect, its just better than before.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The short version: we need to set bits in R0.7 which provide a mask to be used
for PS kill samples/pixels. Since the VS has no such concept, we just need to
set all 1.
The longer version...
Execution for SIMD8 atomics is defined as follows:
SIMD8: The low 8 bits of the execution mask are ANDed with 8 bits of the
Pixel/Sample Mask from the message header. For the typed messages, the Slot
Group in the message descriptor selects either the low or high 8 bits. For the
untyped messages, the low 8 bits are always selected. The resulting mask is used
to determine which slots are read into the destination GRF register (for read),
or which slots are written to the surface (for write). If the header is not
present, only the low 8 bits of the execution mask are used.
The message header for untyped messages is defined in R0.7 "This field contains
the 16-bit pixel/sample mask to be used for SIMD16 and SIMD8 messages. All 16
bits are used for SIMD16 messages. For typed SIMD8 messages, Slot Group selects
which 8 bits of this field are used. For untyped SIMD8 messages, the low 8 bits
of this field are used." Furthermore, "The message header for the untyped
messages only needs to be delivered for pixel shader threads, where the
execution mask may indicate pixels/samples that are enabled only due to
derivative (LOD) calculations, but the corresponding slot on the surface must
not be accessed." We're not using a pixel shader here, but AFAICT, this mask is
used for all stages.
This leaves two options, Remove the header, or make the VS code emit the correct
thing for the header. I believe one of the goals of using SIMD8 VS was to get as
much code reuse as possible, and so I chose the latter. Since the VS has no such
thing as kill instructions, the mask is derived simple as all 1's.
v2:
Add a comment to the code (stolen from Curro on the mailing list)
Change the control flow style (Curro + Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87258
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When restoring the current state in _mesa_meta_end it was previously trying to
copy the on-going sample count of the current occlusion query into the new
query after restarting it so that the driver will continue adding to the
previous value. This wouldn't work for two reasons. Firstly, the query might
not be ready yet so the Result member will usually be zero. Secondly the saved
query is stored as a pointer to the query object, not a copy of the struct, so
it is actually restarting the exact same object. Copying the result value is
just copying between identical addresses with no effect. The call to
_mesa_BeginQuery will have always reset it back to zero.
This patch fixes it by making it actually wait for the query object to be
ready before grabbing the previous result. The downside of doing this is that
it could introduce a stall but I think this situation is unlikely so it might
not matter too much. A better solution might be to introduce a real
suspend/resume mechanism to the driver interface. This could be implemented in
the i965 driver by saving the depth count multiple times like it does in the
i945 driver.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88248
Reviewed-by: Carl Worth <cworth@cworth.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
This line was removed by accident in commit
16b9112574 causing a regression in the
ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert Khronos conformance
test. It's necessary because the swizzle_result() code below expects
all four components of the vector to be valid.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89094
Tested-by: Lu Hua <huax.lu@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need to swizzle the rhs to match the number of components in the writemask,
otherwise we'll hit an assertion in ir_assignment.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some old format conversion code in pack.c implemented byte-swapping like this:
GLint comps = _mesa_components_in_format(dstFormat);
GLint swapSize = _mesa_sizeof_packed_type(dstType);
if (swapSize == 2)
_mesa_swap2((GLushort *) dstAddr, n * comps);
else if (swapSize == 4)
_mesa_swap4((GLuint *) dstAddr, n * comps);
where n is the pixel count. But this is incorrect for packed formats,
where _mesa_sizeof_packed_type is already returning the size of a pixel
instead of the size of a single component, so multiplying this by the
number of components in the format results in a larger element count
for _mesa_swap than we want.
Unfortunately, we followed the same implementation for byte-swapping
in the rewrite of the format conversion code for texstore, readpixels
and texgetimage.
This patch computes the correct element counts for _mesa_swap calls
by computing the bytes per pixel in the image and dividing that by the
swap size to obtain the number of swaps required per pixel. Then multiplies
that by the number of pixels in the image to obtain the swap count that
we need to use.
Also, when handling byte-swapping in texstore_rgba, we were ignoring
the image's depth. This patch fixes this too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
... instead of emit(BRW_OPCODE_CMP, ...). In commit 6b3a301f I changed
vec4_visitor::CMP to set the destination's type to that of src0. In the
following commit (2335153f) I removed an apparently now unnecessary work
around for Gen8 that did the same thing.
But there was a single place that emitted a CMP instruction without
using the vec4_visitor::CMP function. Use it there.
And change dst_null_d to dst_null_f for good measure, since ARB vp
doesn't have integers.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89032
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
3DSTATE_CC_STATE_POINTERS seems to be ignored when bit 0 of DW1 is not set.
Follow i965 and set the bit for 3DSTATE_CC_STATE_POINTERS and
3DSTATE_BLEND_STATE_POINTERS. Add gen checks for all state pointer commands.
If a transform feedback buffer's size is 0, st_bufferobj_data doesn't
end up creating a buffer for it. There's no point in trying to write to
such a buffer, so just pretend as if it's not really there.
This fixes arb_gpu_shader5-xfb-streams-without-invocations on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
GLSL IR labels gl_FrontFacing as an input variable and not a system value.
This commit makes NIR silently translate gl_FrontFacing to a system value
so that it properly gets translated into a load_system_value intrinsic.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
To help identify llvmpipe rasterizer threads -- especially when there
can be so many.
We can eventually generalize this to other OSes, but for that we must
restrict the function to be called from the current thread. See also
http://stackoverflow.com/a/7989973
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Current implementation allowed usage of unsized type texture GL_FLOAT
and GL_HALF_FLOAT as a render target as this was 'expected behavior' by
WEBGL_oes_texture_float and is also allowed by the oes-texture-float
WebGL test. However this broke some ES3 conformance tests that do not
accept such behavior. Patch sets such an fbo incomplete as expected by
the ES3 conformance tests. Textures with sized types like RGBA32F will
still continue to work as render targets.
v2: code style cleanups (Ian Romanick, Matt Turner)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88905
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:
total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
The compiler can't tell that we're always going to hit the first if block
on the first time through the loop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If execution was supposed to be supported in this case, we'd run into
trouble from completely uninitialized sat_imm values.
v2: Drop the '!' before the string.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We always pass this argument, even if it won't be used by the particular
texture op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit f82f2fb3dc added use of the Mesa
IR optimizer for both ARB_fragment_program and ARB_vertex_program, but
only justified the vertex-program portions with measured performance
improvements.
Meanwhile, the optimizer was seen to generate hundreds of unused
immediates without discarding them, causing failures.
Discard the use of the optimizer for now to fix the regression. (In
the future, we anticpate things moving from Mesa IR to NIR for better
optimization anyway.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82477
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CC: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
We need to build certain parts of Mesa (namely gallium, llvmpipe, and
therefore util) with Windows SDK 7.0.7600, which includes MSVC 2008.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To fix build when libdrm is not found,
commit a594cec7e3 did put several
parts of egl code under #ifdef HAVE_DRM_PLATFORM.
HAVE_DRM_PLATFORM means the egl drm platform is being built.
What should have been used instead is HAVE_LIBDRM.
At a few locations, the HAVE_DRM_PLATFORM introduced
have already been replaced by HAVE_LIBDRM, this patch
replaces the remaining occurences.
This patch makes for example EGL_EXT_image_dma_buf_import
be advertised by egl under x11 when the drm egl platform
is not built, whereas previously it required the drm egl
platform to be built.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
With commit c642e87d9f4(auxiliary/vl: rework the build of the VL code)
we split out the VL code into a separate static library that was meant
to be used by the VL targets alone - va, vdpau, xvmc.
The commit failed to consider the way we handle vdpau-gl interop and
broke it. Bring back the functionality by keeping the vl <> vl_stub
separation as requrested by Christian.
v2: Update the omx target as well. Update mesa-stable email address.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86837
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Currently having the wayland-scanner is optional, which causes problems
when autotools parses through the makefiles, and tries to generate all
the BUILT_SOURCES.
As the config option --with-egl-platform=wayland is not the default, we
won't end up setting the WAYLAND_SCANNER variable, which in turn will
cause some files to not get generated.
There has been a wayland-scanner package as of wayland 1.2 which
provides a variable for the scanner binary, so let's use that one and
fall back to manually searching via AC_PATH_PROG when needed.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use nir/nir_opcodes.h as is (w/o the absolute path), as it is the target
name used to generate the actual file. Otherwise the target is missing,
the file won't get generated and the build will fail.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're using a SIMD4x2 sampler message, which has execsize 4, and so the
register width must be <= 4. Use <4,4,1> regioning instead of <8,8,1>
regioning to access the same data but avoid tripping the assert.
Fixes the following piglit tests:
spec/glsl-1.20/compiler/structure-and-array-operations/array-selection.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-basic.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-struct.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-function.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-array.vert
glslparsertest/glsl2/condition-07.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-variable.vert
v2: Better commit message courtesy of Ken.
I had a discussion with Ken, and we both question how we end up with a mov and
execsize 4. For now though, this fixes the piglit tests, so we can worry about
it later.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Accumulated changes for various renames and additions, including Gen8
definitions. Some of the dynamic state __SIZE no longer means the size of an
element, but the size of an array of elements. The changes can be seen in
ilo_render_dynamic.c.
LINTERP is implemented as a PLN instruction or a LINE+MAC. PLN and MAC
can do conditional mod. CINTERP is just a MOV.
total instructions in shared programs: 5952103 -> 5950284 (-0.03%)
instructions in affected programs: 324573 -> 322754 (-0.56%)
helped: 1819
We lose the SIMD16 in one Unigine Heaven shader which appears six times
in shader-db.
Dead since
commit 284ce20901
Author: Eric Anholt <eric@anholt.net>
Date: Fri Aug 20 10:52:14 2010 -0700
Remove remnants of the old glsl compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And unfortunately other shaders do the same thing but with >=/<= which
we can't apply this optimization to because of NaNs.
instructions in affected programs: 23309 -> 22938 (-1.59%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No change on shader-db on i965.
v2: Reword the comment due to feedback from Erik Faye-Lund
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
We want the size of a float per component, not the size of a whole vec4.
NIR instructions on i965:
total instructions in shared programs: 1261937 -> 1261929 (-0.00%)
instructions in affected programs: 114 -> 106 (-7.02%)
Looking at one of these examples (tesseract), it's from vec4 load_consts
for a MRT solid fill, which do get CSEed now that we don't memcmp off the
end of the const value and into the SSA def. For the 1-component loads
that are common in i965, we were only memcmping off into the rest of the
usually zero-filled const_value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
xfont.c:237:14: error: implicit declaration of function 'GetGLXDRIDrawable' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
glxdraw = GetGLXDRIDrawable(CC->currentDpy, CC->currentDrawable);
^
Fixes regression from 291be28476
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Lots of shaders divide by exp2(...) which we turn into a multiplication
by the reciprocal. We can avoid the reciprocal by simply negating exp2's
argument.
total instructions in shared programs: 5947154 -> 5946695 (-0.01%)
instructions in affected programs: 118661 -> 118202 (-0.39%)
helped: 380
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Same as commit 3654b6d4 to the fs backend.
total instructions in shared programs: 5945788 -> 5945787 (-0.00%)
instructions in affected programs: 36 -> 35 (-2.78%)
helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Same as commit c4fab711 to the fs backend.
total instructions in shared programs: 5945998 -> 5945788 (-0.00%)
instructions in affected programs: 74665 -> 74455 (-0.28%)
helped: 399
HURT: 180
It hurts some programs because we make no attempts in the vec4 backend
to avoid MADs if they have constant (or vector uniform) arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Skylake+ doesn't support setting a depth buffer to a 1D surface but it
does allow pretending it's a 2D texture with a height of 1 instead.
This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit
test (and also seems to avoid a subsequent GPU hang).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Null surfaces are going to be useful to have something to point
unbound image units to, as the ARB_shader_image_load_store extension
requires us to behave deterministically in cases where some shader
tries to access an unbound image unit: Invalid stores and atomics are
supposed to be discarded and invalid loads are supposed to return
zero, which is precisely what the null surface does.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't really improve locality of texture fetches, quite the
opposite it's a waste of memory bandwidth and space due to tile
alignment.
v2: Check mt->logical_height0 instead of mt->target (Ken). Add short
comment explaining why they shouldn't be tiled.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Negation of UD/UW sources behaves the same as for D/W sources, taking
the two's complement of the source, except for bitwise logical
operations on Gen8 and up which take the one's complement. Fixes
crash in a GLSL shader with subtraction of two unsigned values.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Scalar registers are required to have zero stride, fix the
regs_written calculation not to assume that the instruction writes
zero registers in that case.
v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using 'ralloc*(this, ...)' is wrong if the object has automatic
storage or was allocated through any other means. Use normal dynamic
memory instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason why you need a vec4_visitor to construct a
vec4_instruction is to initialize vec4_instruction::ir and
::annotation. Instead set them from vec4_visitor::emit() just like
fs_visitor does.
Reviewed-by: Matt Turner <mattst88@gmail.com>
One should be able to manipulate i965 IR without pulling the whole
FS/VEC4 visitor classes -- Optimization passes and other
transformations would ideally be visitor-agnostic. Among other issues
this avoids a circular dependency between the header file where such
visitor-agnostic code will be defined and the main FS/VEC4 header
where both IR (layer below) and visitor (layer above) happen to be
defined.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.
v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cubemap array images are unlike cubemap array samplers in that they don't need
an additional coordinate to index individual cubemaps in the array, instead
they behave like a 2D array of 6n layers, with n the number of cubemaps in the
array. Take this exception into account.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some people have complained that code using the CEILING() macro is
difficult to understand because it's not immediately obvious what it
is supposed to do until you go and look up its definition. Use a more
descriptive name that matches the similar utility macro in the Linux
kernel.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Without this when an application issues that query, it would try to
wait the result from the gpu, and since no query has been actually
issued, it will wait forever.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.
V2: improve readability and add comments to clarify decisions
V3: Remove redundant code... compiler already attempts to put the IMM in
SSRC1
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
But don't enable generation of it in the opProperties, because we can't
guarantee the SDST==SRC2 constraint until after register assignment. We'll
add a post-RA folding pass to utilise this.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add emission rules for negative and saturate flags for MAD 4-byte opcodes,
and get rid of some of the constraints. Obviously tested with a wide variety
of shaders.
V2: Document MAD as supported short form
V3: Split up IMM from short-form modifiers
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The old way made it impossible for the optimizer to reason about what
was going on. The new way is the same number of instructions (the neg
gets folded into the cvt) but enables the optimizer to be cleverer if
comparing to a constant (most common case). [The optimizer is presently
not sufficiently clever to work this out, but it could relatively easily
be made to be. The old way would have required significant complexity to
work out.]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Printing instructions doesn't modify them, so we can mark the parameter
const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The hardware's integer luminance formats are completely unusable;
currently we fall back to RGBA. This means we need to override
the texture swizzle to obtain the XXX1 values expected for luminance
formats.
Fixes spec/EXT_texture_integer/texwrap formats bordercolor [swizzled]
on Broadwell - 100% of border color tests now pass on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
This provides for atomic addition, which will be used by an upcoming
shader-cache patch. A simple test is added to "make check" as well.
Note: The various O/S functions differ on whether they return the
original value or the value after the addition, so I did not provide
an add_return() macro which would be sensitive to that difference.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, the set_insert function would bail early if it found a deleted
slot that it could re-use. However, this is a problem if the key being
inserted is already in the set but further down the list. If this happens,
the element ends up getting inserted in the set twice. This commit makes
it so that we walk over all of the possible entries for the given key and
then, if we don't find the key, place it in the available free entry we
found.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the hash_table_insert function would bail early if it found a
deleted slot that it could re-use. However, this is a problem if the key
being inserted is already in the hash table but further down the list. If
this happens, the element ends up getting inserted in the hash table twice.
This commit makes it so that we walk over all of the possible entries for
the given key and then, if we don't find the key, place it in the available
free entry we found.
Reviewed-by: Eric Anholt <eric@anholt.net>
For framebuffer completeness checks, consider renderbuffers as not
layered. Previously, they would have counted as layered if a layered
textured had previously been bound to the same attachment point. This
could cause framebuffer completeness checks to incorrectly fail with
GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS, even if no layered attachments
were present.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89026
An upcoming patch is going to introduce some code here, and having this code
organized as the patch does makes it a bit easier to read later.
There should be no functional change here.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
As it turns out, we were over-thinking the cause of the hang on
Cherryview. It's simply errata for Cherryview.
commit 88fea85f09
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Nov 21 10:47:41 2014 -0800
i965/vec4/gen8: Handle the MUL dest hazard exception
This is an explanation to why we never saw the hang on BDW.
NOTE: The problem the original patch was trying to fix does still exist. It will
have to be fixed at some point.
v2: Modify commit message, s/CHV/BDW
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This expects (0,0,0,0), though it can be changed to something else or allow
more than one set of values to be considered correct.
This is currently the radeonsi behavior.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Since alu does not support abs() modifier on source operands, spill
and apply the modifiers to a temp register when needed.
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were incorrectly attributing VS time to FS8 on Gen8+, which now use
fs_visitor for vertex shaders.
We don't hit this for geometry shaders yet, but we may as well add
support now - the fix is obvious, and we'll just forget later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we special cased FB writes and URB writes in the register
allocation code. What we really wanted was to handle any message with
EOT set.
This saves us from extending the list with new opcodes in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This helper function basically just checks inst->eot, but also asserts
that only opcodes we expect to terminate threads have EOT set. As far
as I'm aware, we've never had such a bug.
Removing it means that we don't have to extend the list for new opcodes.
Cherryview and Skylake introduce an optimization where sampler messages
can have EOT set; scalar GS/HS/DS will likely introduce new opcodes as
well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The latter currently implies CPU read access, so only PIPE_USAGE_STAGING
can be expected to be fast.
Mesa demos src/tests/streaming_rect on Kaveri (radeonsi):
Unpatched: 42 frames in 1.023 seconds = 41.056 FPS
Patched: 615 frames in 1.000 seconds = 615.000 FPS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88658
Cc: "10.3 10.4" <mesa-stable@lists.freedestkop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use a dummy vertex buffer object when vs inputs have no corresponding
entries in the vertex declaration. This dummy buffer will give to the
shader float4(0,0,0,0).
This fixes several artifacts on some games.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
The drm fd wasn't released, causing a crash
for wine tests on nouveau, which seems to have
a bug when a lot of device descriptors are open.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We weren't releasing hal and ref, causing some issues (threads not released, etc)
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When on a render node the unique ioctl doesn't work.
This patch drops the code to detect the device, which relied
on an ioctl, and replaces it by the mesa loader function.
The mesa loader function is more complete and won't fail for render-nodes.
Alternatively we could also have used the pipe cap to
determine the vendor and device id from the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This seems to be the behaviour on Win. Previous behaviour led
to different issues depending on the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
If has_present_buffers was false at first,
but after a device reset, it turns true (for
example if we begin to render to a multisampled
back buffer), there was a crash due to present_buffers
being uninitialised.
This patch fixes it.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous code wasn't checking against the correct limit: 224
for sm3 hardware, but 256.
Fixes wine test test_pixel_shader_constant()
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Wine tests, and probably some apps, check for errors by checking for NULL
instead of error codes.
Fixes wine test test_surface_blocks()
Reviewed-by: Axel davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Add returncode E_FAIL.
Return E_FAIL for any vertexdeclaration element with type unused.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
ATOC is an hack for Alpha to coverage
that is supported by NV and Intel.
You need to check the support for it
with CheckDeviceFormat.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack is supposed to be supported
by all AMD SM2+ cards. Apps use it without
checking if they are on AMD.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This depth buffer format, like D3DFMT_INTZ, can be used to read
the depth buffer values when bound to a shader.
Some apps may use this format to get better performance when
they don't need the precision of INTZ (24 bits for depth, 8 for
stencil, whereas DF16 is just 16 bits for depth)
We don't add support for DF24 yet, because it implies support
for FETCH4, which we don't support for now.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some drivers support PIPE_FORMAT_S8_UINT_Z24_UNORM,
some others PIPE_FORMAT_Z24_UNORM_S8_UINT, some both.
It doesn't matter which one we use, since the d3d formats
they map to aren't lockable (app can read it directly).
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Move the checks of whether the format is supported
into a common place.
The advantage is that allows to handle when a d3d9
format can be mapped to several formats, and that
cards don't support all of them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The order of the format is changed to have
an increasing ordering of the d3d9 format values.
Some missing formats are added and matched to PIPE_FORMAT_NONE
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Because the debug output of this function was cut in two parts,
sometimes the second part wasn't print when we would return earlier,
whereas we would like to get it.
The reason of the separation was that it's only at the end of the function
we can print what we map to the d3d9 arguments, but we can always retrieve
that info by hand.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack allows to resolve a multisampled
depth buffer into a single sampled one.
Note that the implementation is slightly incorrect.
When querying the content of D3DRS_POINTSIZE,
it should return the resz code if it has been set.
This behaviour will be implemented when state changes
will be reworked. For now the current behaviour is ok,
since apps use the D3DCREATE_PUREDEVICE flag when creating
the device, which means they won't read states and in exchange
get better performance.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This fixes a wine test and some minor visual issues on some games.
The patch is not optimal, there is probably a more efficient way to
fix this issue, but the code there already has some innefficiencies.
There is plans to rewrite that part of the code to make it more
efficient.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
D3DSP_NOSWIZZLE already contains the shift.
Detected with Clang.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This removes unneeded hack for Anno 1404.
This app is not checking the number of supporting
constants, and rely on the shader compilation to fail
if it puts too many constants.
This patch also checks for the correct number of constants for ps.
Note that we don't check the official limitations for old vs and ps
versions. The restrictions were fixed, unlike for the number of vertex
shader constants for later versions. Likely apps use the correct number,
and it's not a problem for us if it wants use more.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Instead of crashing on buggy shaders, we should return an error.
This patch introduces this behaviour in the case of invalid constant
access
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
r500 hasn't enough float constants for vs to fill all needs.
Overlapping issues can happen with complex shaders.
The fix would be to recompile shaders to include the integer
and boolean constants, instead of reserving slots for them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previously 276 constants were declared everytime.
This patch makes shaders declare constants up to the maximum
constant needed and moves the moment we print the TGSI
shader after the moment we declare the constants.
This is needed for r500, since when indirect addressing is used,
it cannot reduce the amount of constants needed, and that it is
restricted to 256 constant slots.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Count explicitly the slots for float, int and bool constants,
and deduce the constbuf size in nine_shader.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch raises nine requirements and disables nine for old
hw that don't match them.
Currently for these cards only games that don't have tight requirements
would work well with nine. However nine is missing several checks
regarding these limitations.
To make code and future patches less heavy, dropping support for these old
card seems a good solution.
That makes r500 the only dx9 generation cards supported by nine. It seems the one
with the less limitations for nine. Still not everything is ok, and we'll have
for example to implement shader recompilation for these cards to include
integer and boolean constants in the shader.
Eventually when this is done, we can reintroduce support for older cards.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Resolving a multisampled depth texture into
a single sampled texture is supported on >= SM4.1
hw. It is possible some previous hw support it.
The ability was tested on radeonsi and nvc0.
Apparently is is also supported for radeon >= r700.
This patch adds the MULTISAMPLE_Z_RESOLVE cap and
add it to the drivers. It is advertised for drivers
for which it is sure the ability is supported.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Khronos modified glext.h to get rid of GL_TEXTURE_BINDING, a special enum
added for ARB_direct_state_access. This enum was ruled unimplementable.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Nothing special needs to be done.
Even though llvmpipe copies constant (ie uniform) buffers internally, the
application is supposed to flush and sync, so all should work.
All bufferstorage piglit tests pass.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As ffsll doesn't exist in MSVC yet, and u_bit_scan64 is only used by
radeonsi which is never built with MSVC.
This is just a stop-gap fix to unbreak MSVC build until we refactor these
mathematical portability wrappers into src/util.
Trivial.
We need a slot for the stipple texture and the pixel shader already uses
32 textures (16 API slots + 16 FMASK slots).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The hardware obeys swizzles even if the resource is NULL.
This will be used by set_polygon_stipple.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it.
Instead, use offset = 0, which is what we always do when not appending.
This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.
Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For drivers that use higher slots not to crash in tgsi_shader_info.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a rebase swizzle is provided and we call _mesa_swizzle_and_convert
after unpacking the source format we were always passing normalized=false.
We should pass true or false depending on the formats involved in the
conversion for the byte and float paths (the integer path cannot ever be
normalized).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Originally, get_alu_src was supposed to handle resolving swizzles and
things like that. However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore. The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form. We can handle those cases as a special case
easily enough. As a side-effect, we don't need the vec_to_movs pass
anymore.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we detect if we need an extra copy for swizzling. The
old code involved a pile of confusing switch fall-throughs; we now use a
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we can scalarize with NIR, there's no need for all this code
anymore. Let's get rid of it and just do scalar operations.
v2: run copy prop before lowering phi nodes
v3: Get rid of the "emit(...)->saturate = foo" pattern
v4: Run alu_to_scalar as an optimization pass
total instructions in shared programs: 5998321 -> 5974070 (-0.40%)
instructions in affected programs: 732075 -> 707824 (-3.31%)
helped: 3137
HURT: 191
GAINED: 18
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All but 16 of the programs helped were ARB fp programs.
total instructions in shared programs: 5949286 -> 5945470 (-0.06%)
instructions in affected programs: 275162 -> 271346 (-1.39%)
helped: 1197
GAINED: 1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:
"If a negative number is provided where an argument of type sizei or
sizeiptr is specified, an INVALID_VALUE error is generated.
This patch adds checks for negative buffer size values passed to different APIs.
It also moves up the check on other APIs that already had it, making it the first
error check performed in the function, for consistency.
While there may be other APIs throughtout the code lacking this check (or at least
not at the beginning of the function), this patch focuses on the cases that break
the dEQP tests listed below. It could be a good excersize for the future to check
all other cases, and improve consistency in the order of the checks throughout the
whole Mesa code base.
This fixes 5 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_attached_shaders
* dEQP-GLES3.functional.negative_api.state.get_shader_source
* dEQP-GLES3.functional.negative_api.state.get_active_uniform
* dEQP-GLES3.functional.negative_api.state.get_active_attrib
* dEQP-GLES3.functional.negative_api.shader.program_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec:
"If the default framebuffer is bound to target, then attachment must be
BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or
STENCIL, identifying the stencil buffer."
OpenGL ES 3.0, section 2.5 (GL Errors):
"If a command that requires an enumerated value is passed a
symbolic constant that is not one of those specified as allowable
for that command, an INVALID_ENUM error is generated."
Then change the returned error to INVALID_ENUM.
Fixes:
dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750) 0.0000000447
mod(121.57, 13.29) 0.0000023842
mod(3769.12, 321.99) 0.0000762939
mod(3769.12, 1321.99) 0.0001220703
mod(-987654.125, 123456.984375) 0.0160663128
mod( 987654.125, 123456.984375) 0.0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX
(2.8.1 Transferring Array Elements, page 26) which is not currently
possible to query using glGet*() funcs.
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For code such as:
uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);
We produce code like:
mov(8) g5<1>.xF -g9<4,4,1>.xUD
which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.
It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.
This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.
Fixes the following 24 dEQP tests:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing enables the extension yet, but the values are now available.
The spec calls for it to only be exposed for GL 3.3+, which is core-only
in mesa. Instead we allow any driver to enable it, including in a compat
context for any GL version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Paul originally had to reverse engineer these formulas based on the
description about how the sampler works. The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.
Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels. If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user. Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations. After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.
This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment. On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.
The solution is to add a new _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.
The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).
The gears demo and others hit this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The intrinsics are universally available, whereas older Windows SDKs (e.g.
7.0.7600) don't have the non-intrisic entrypoint.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.
This fixes the fbo-blending-formats Piglit test and possibly others.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When color buffers alone are concerned the depth is not needed.
No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.
This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.
v4: take assert away and make valid_filter_for_float conditional (Tapani),
fix the alphabetical order (Emil)
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list(). This is
apparently relied on by r300g.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function. So, for now, we'll
just disable it until we actually have a C++ user.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
[ Francisco Jerez: As discussed on the mailing list, this is intended
to produce more useful debug output in cases where the compilation
terminates unexpectedly. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[ Francisco Jerez: As we're at it make debug_options[] local to its
only user and remove temporary. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.
On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.
This is a first pass attempt at this, and passes the piglit
tests that test for this.
v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.
On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Refactor to make the functions look more like the old
intel_tex_subimage_tiled_memcpy
- Don't export the readpixels_tiled_memcpy function
- Fix some pointer arithmatic bugs in partial image downloads (using
ReadPixels with a non-zero x or y offset)
- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
miplevel other than zero.
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Better documentation fot the *_tiled_memcpy functions
- Add target restrictions for renderbuffers wrapping textures
v4: Jason Ekstrand <jason.ekstrand@intel.com>
- Only check the return value of brw_bo_map for error and not bo->virtual
v5: Jason Ekstrand <jason.ekstrand@intel.com>
- Don't unnecessarily repeat a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit addes tiled copy functions for coping from tiled memory to
linear memory. These are very similar to the existing linear-to-tiled
paths.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- New commit message
- Various whitespace fixes
- Added ptrdiff_t casts as done in commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.
Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value). However, we wouldn't
delete CMP.nz instructions that served the same purpose.
We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.
This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.
No changes in shader-db without NIR. With NIR,
total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs: 2087139 -> 2050350 (-1.76%)
helped: 10704
HURT: 0
GAINED: 2
LOST: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
GAINED: 103
LOST: 43
Causes one program to spill (643 -> 1076 instructions).
I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.
No piglit regressions on any i965 platform in Jenkins.
total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0
No other platforms affected in shader-db.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Most of these exist in the GLSL IR algebraic pass already. However,
SSA allows us to find more instances of the patterns.
total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs: 124189 -> 120026 (-3.35%)
helped: 604
total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs: 261295 -> 254507 (-2.60%)
helped: 1295
HURT: 3
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.
The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.
total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs: 411143 -> 405922 (-1.27%)
helped: 2233
HURT: 214
A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x. We can always clean that up later.
total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs: 784980 -> 775093 (-1.26%)
helped: 4508
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GLSL IR optimization pass contained these; we may as well include
them too.
v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).
No change in the number of NIR instructions on a shader-db run.
total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that. While there,
I decided to add a bunch more of them.
v2: Delete bogus fand/for optimizations (caught by Jason).
total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs: 149634 -> 146937 (-1.80%)
helped: 1032
total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs: 537 -> 542 (0.93%)
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value. This
made ALU operations on those values appear distinct as well.
Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.
Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered. I didn't implement
support for variables for the time being.
v2: Assert that info->num_variables == 0 (requested by Jason).
total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%)
helped: 16872
total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs: 640654 -> 620094 (-3.21%)
helped: 2071
HURT: 585
GAINED: 14
LOST: 25
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.
The next patch will introduce another case that requires SSA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to count NIR instructions via shader-db.
Use "run" as normal. The results file will contain both NIR and
assembly.
Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)
Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is useful for debugging and looking for optimization opportunities.
It will need to be expanded when we add support for other scalar stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.
total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.
On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message. Since we only do this for >= HSW,
this is ok since there are no MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space. The usual candidate was
the destination of the sampler instruction. However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data. Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previous code semantic was:
. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.
Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.
Drivers like r300 dislike when vs has inputs that are not fed.
Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.
The new code semantic is:
. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage
The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.
This fixes 3Dmark05, which uses ff vs with programmable ps.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.
This patch makes this buffer be allocated once.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.
The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
texcoord for ps < 1_4 should clamp between 0 and 1 the values.
texcrd (texcoord ps 1_4) does not clamp and can be used with
two modifiers _dw and _dz that means the channels are divided
by w or z.
Implement those in shared code, since the same modifiers can be used
for texld ps 1_4.
v2: replace DIV by RCP + MUL
v3: Remove an useless MOV
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Nothing seems to indicates the negation modifier would be stored in the
instruction flags instead of the source modifier. tx_src_param has
already handled it if it is in the source modifier.
In addition,
when the card supports native integers, the boolean
are stored in 32 bits int and are equal to
0 or 0xFFFFFFFF.
Given 0xFFFFFFFF is NaN if it was a float, better use
UIF than IF.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation was behaving fine, but improve it by:
. Improved documentation
. Decreasing counter (comparing to 0 is likely to be faster than to constant)
. Move the counter update at the end for better performance for shaders that
break the loop earlier than when the count is done.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation didn't work well with nested loops.
Instead of using several address registers, put a0 and aL
into normal registers, and copy them to one address register when
we need to use them.
Wine tests loop_index_test() and nested_loop_test() now pass correctly.
Fixes r600g crash while loading Bioshock -
bug https://bugs.freedesktop.org/show_bug.cgi?id=85696
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
When the input's xyz are 0.0, the output
should be 0.0. This is due to the fact that
Inf * 0 = 0 for dx9. To handle this case,
cap the result of RSQ to FLT_MAX. We have
FLT_MAX * 0 = 0.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We should use the absolute value of the input as input to ureg_RSQ.
Moreover, an input of 0.0 should return FLT_MAX.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Let's say we have c1 and c2 declared in the shader and c0 given by the app
Then here we would have read c0, c1 and c2 given by the app, instead
of the correct c0, c1, c2.
This correction fixes several issues in some games.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Convert them to shader booleans at earlier stage.
Previous code is fine, but later patch will make
integers being converted at earlier stage, so do
the same for booleans
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Buffers in the MANAGED pool are supposed to have the content in a ram buffer,
a copy in VRAM if there is enough memory (driver manages memory and decide when
to delete the buffer in VRAM).
This is not implemented properly in nine, and a VRAM copy is going to be created
when the RAM memory is filled, and the VRAM copy will get synced with the RAM
memory updates.
Due to some issues (in the implementation or in app logic), it can happen
we try to create a sampler view of the resource while we haven't created the
VRAM resource. This hack creates the resource when we hit this case, which prevents
crashing, but doesn't help with the resource content.
This fixes several games crashing at launch.
Acked-by: Axel Davy <axel.davy@ens.fr>
Acked-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Stanislaw Halik <sthalik@misaki.pl>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
While previous code was having the correct behaviour in general,
this new code is more readable (without checking all gallium formats
manually) and has a more defined behaviour for depth stencil resources.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The implicit swapchains are destroyed when the device instance is
destroyed. However for non-implicit swapchains, it is not the case,
and the application can have kept an reference on the swapchain
buffers to reuse them.
Fixes problems with battle.net launcher.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This->surfaces contains the surfaces associated to the levels
and faces. This->surfaces[6*Level] is what we want here,
since it gives us a face descriptor for the level 'Level'.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The cap means D3DFVF_XYZRHW vertices will see clipping.
This is not the case when
PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION is supported, since
it'll disable clipping.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The clip state was reset everytime, incurring an overhead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value. But once we start doing vector bools in a few commits,
we end up getting bad values.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do. Just provide the
simpler helper, instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Mac OS X XQuartz places X11 headers at /opt/X11/include.
This patch fixes this Mac OS X SCons build error.
Compiling src/gallium/state_trackers/glx/xlib/glx_api.c ...
In file included from src/gallium/state_trackers/glx/xlib/glx_api.c:34:
include/GL/glx.h:30:10: fatal error: 'X11/Xlib.h' file not found
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Since the meta path can do strictly more than the blitter path, we just
remove the blitter path entirely.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This meta path, designed for use with PBO's, creates a temporary texture
out of the PBO and uses BlitFramebuffers to do the actual texture upload.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add support for handling simple packing options
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Refactor to split out the texture-from-pbo code
- Rename to _mesa_meta_pbo_TexSubImage
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Going through the for loop every time has noticable overhead. This fixes
things up so we only do that once ever and then just do a hash table lookup
which should be much cheaper.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use once_flag and call_once from c11/threads.h instead of pthreads
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Previously, we were completely ignoring the mt->offset field for
renderbuffers. While it does have some alignment constraints, it is valid
to use it. This patch adds the code to each of the 4 surface state setup
functions to handle it.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Fixes currently failing Piglit case
interface-blocks-name-reused-globally.vert
v2: combine var declaration with assignment (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6. With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
This fixes two problems reported by osc:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
The below code crashes when vector_elements <= 0
Fixes Warray-bounds warnings
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently 2 users of this functionality. I have 2 more users coming
up, and having a simple function makes the results much cleaner. The existing
interface semantics was proposed by Matt.
v2 (Ken): Rename to region_matches()/has_scalar_region().
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GEN8 added the QWORD as a valid type for certain operations on the EU.
In order to calculate the number of registers used one must have the type
size as part of the equation. Quoting the formula in the code:
regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
Adding this separately for bisection since there is no simple way to add
an assert in the type_sz function.
NOTE: As a side note, I was confused for a while because it's impossible
to calculate the region, ie. registers needed, without vstride. However,
at this point these are all part of the IR, and so no vstride must exist.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.
v2:
- Make sure not to break older LLVM.
Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'. Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two. Either draw by itself works
fine, but together, they hang the GPU. Removing the glUniform call
makes the hangs disappear. In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear. I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).
I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further. We have no real tools,
and the hardware people moved on years ago. I've analyzed 20+ error
states and read every scrap of documentation I could find.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
offset() properly handles reg_width, so it'll work for SIMD16.
While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_fs_nir.cpp creates almost all of its registers via:
fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));
When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.
This patch replaces that pattern with a new "vgrf" helper method:
fs_reg reg = vgrf(num_components);
The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.
v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.
v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).
This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.
v2: Rebase from May 2014 -> January 2015.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The filename of sha1.h was conflicting with the system-provided
sha1.h, (and in some confiurations, our sha1.c was unsuccessfully
attemping to include "sha1.h" and <sha1.h> as two different files).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88523
Commit 8ec6534 changed texture upload path and the way how texture
format is being checked, this commit adds support for GL_RGB with
GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension
EXT_texture_type_2_10_10_10_REV specification.
This fixes regression in ES3 conformance test
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We hit an assertion that the destination of the FB write should not be
an immediate. (I don't know what we were thinking.) Use ARF null.
Trying to substitute real shaders with the dummy shader would crash
when trying to upload non-existent uniforms. Say there are none.
It also wouldn't generate any code because we didn't compute the CFG,
and code generation now requires it. Compute it.
Gen4-5 also require a message header to be present.
On Gen6+, there were assertion failures in SF/SBE state because
urb_setup was memset to 0 instad of -1, causing it to think there were
attributes when nothing was set up right. Set to no attributes.
Finally, you have to ensure "Setup URB Entry Read Length" is non-zero
or you get GPU hangs, at least on Crestline.
It now works on at least Crestline and Haswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
glibc 2.19 introduced _DEFUAULT_SOURCE as a replacement for _BSD_SOURCE,
and deprecates _BSD_SOURCE with an annoying warning. Defining both is
how you're supposed to transition so let's do that. It gets rid of the
warning and we can figure out when/if we can drop _BSD_SOURCE later.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Fix build error.
CC libmesautil_la-sha1.lo
sha1.c: In function '_mesa_sha1_final':
sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)
gcry_md_hd_t h = (grcy_md_hd_t) ctx;
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88519
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The driver name is no longer const, it's always allocated dynamically
one way or another. Drop const from dri_screen_create_dri2
driver_name argument to avoid warning.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We don't actually have the code for the shader cache just yet, but
this configure machinery puts everything in place so that the shader
cache can be optionally compiled in.
Specifically, if the user passes no option (neither
--disable-shader-cache, nor --enable-shader-cache), then this feature
will be automatically detected based on the presence of a usable SHA-1
library. If no suitable library can be found, then the shader cache
will be automatically disabled, (and reported in the final output from
configure).
The user can force the shader-cache feature to not be compiled, (even
if a SHA-1 library is detected), by passing
--disable-shader-cache. This will prevent the compiled Mesa libraries
from depending on any library for SHA-1 implementation.
Finally, the user can also force the shader cache on with
--enable-shader-cache. This will cause configure to trigger a fatal
error if no sutiable SHA-1 implementation can be found for the
shader-cache feature.
Bug fix by José Fonseca <jfonseca@vmware.com>: Fix to put conditional
assignment in Makefile.am, not Makefile.sources to avoid breaking
scons build.
Note: As recommended by José, with this commit the scons build will
not compile any of the SHA-1-using code. This is waiting for someone
to write SConstruct detection of the available SHA-1 libraries, (and
set the appropriate HAVE_SHA1_* variables).
Reviewed-by: Matt Turner <mattst88@gmail.com>
The upcoming shader cache uses the SHA-1 algorithm for cryptographic
naming. These new mesa_sha1 functions are implemented with any one of
several differeny cryptographics libraries.
This code was copied from the xserver repository, (where it has
apparently been functioning well on a variety of operating systems),
and comes licensed with a license identical to that of Mesa.
Bug fixes by José Fonseca <jfonseca@vmware.com>: Fix to put
conditional assignment in Makefile.am, not Makefile.sources to avoid
breaking scons build. Fix include file for CryptoAPI section. Fix
missing cast in openssl section.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Prior to copying in code from the xserver configure.ac file, it makes
sense to have the license of this file clearly marked, (to show that
it's licensed identically to the configure.ac file from the xserver
repository).
And since the text of the license refers to "the above copyright
notice" it also makes sense to have an actual copyright attribution in
place.
I generated this list of names by looking at the output of:
git shortlog -n --format=%aD -- configure.ac
(and arbitrarily stopping for contributors with fewer than 15
commits). Then for each name, I looked for existing Copyright
attributions in the mesa source tree with the same name, (and using
"Intel Corporation" as the copyright holder where I knew that was
appropriate).
In addition to exercising all of the functions in blob.h, this
includes a stress test that forces some reallocing, and also tests to
verify the alignment and overrun-detection code in blob.c.
These functions are useful when serializing an unknown number of items
to a blob. The caller can first save the current offset, write a
placeholder uint32, write out (and count) the items, then use
blob_overwrite_uint32 with the saved offset to replace the placeholder
value.
Then, when deserializing, the reader will first read the count and
know how many subsequent items to expect.
(I wrote this code after reading a very similar patch written by
Tapani when he wrote serialization code for IR. Since I re-used the
idea of his code so directly, I've credited him as the author of this
code. --Carl)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).
There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The upcoming shader cache needs this to be able to cache hash data
from the gl_shader_program structure.
Edited-by: Carl Worth <cworth@cworth.org>:
There is an internal implementation detail that the hash table
underlying the struct string_to_uint_map stores each value internally
as (value+1). The user needn't be very concerned with this (other than
knowing that a value of UINT_MAX cannot be stored) since put() adds 1
and get() subtracts 1.
So in this commit, rather than call the user's function directly with
hash_table_call_foreach, we call through a wrapper that fixes up the
off-by-one values before the caller's callback sees them.
And with this wrapper in place, we also give a better signature to the
callback function being passed to iterate(), so that this callback
function can actually expect a char* and an unsigned argument, (rather
than a couple of void* ).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously, if __builtin_unreachable() was unavailable, the
unreachable macro was defined to do nothing. We do better here, by at
least still making it an assert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is similar to the existing functions get_instance,
get_array_instance, etc. for getting a type singleton. The new
get_sampler_instance() function will be used by the upcoming shader
cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated this for FB writes in SIMD16 mode:
load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F
fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf
The LOAD_PAYLOAD's destination had its register width set to 8, and the
FB_WRITE had its execution size set to 8. This seems wrong, and while
it probably doesn't affect anything, we should fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256 represents 1.0 and
scaling it down to a range where 32 represents 1.0. However this is not
correct because it is actually 255 and 31 that represent 1.0.
We can do better with a formula like this:
(x * 31 + 127) / 255
The +127 is to make it round correctly.
The new code has a special case to use uint64_t when the result of the
multiplication would overflow an unsigned int. This function is inline and
only ever called with constant values so hopefully the if statements will be
folded.
The main incentive to do this is to make the CPU conversion path pick the same
values as the hardware would if it did the conversion. This fixes failures
with the ‘texsubimage pbo’ test when using the patches from here:
http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html
v2: Use 64-bit arithmetic when src_bits+dst_bits > 32
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is a rework of the liveness algorithm using a worklist as suggested by
Connor. Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop. Also, the stuff after the last loop
in the funciton will only ever get visited once.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
A worklist is a common concept in optimizations. This adds a structure
that we can reuse for many different types of optimizations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Changes the initial internal format of a render buffer
to GL_RGBA4 in GLES 3. This fixes a failure in the following
DrawElements test:
dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, the set API required the user to do all of the hashing of keys
as it passed them in. Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it. Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names. This is
especially bad when the standard call looks something like
_mesa_set_add(ht, _mesa_pointer_hash(key), key);
In the above case, there is no reason why the hash set shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
This is analygous to 94303a0750 where we did this for hash_table
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already have search_pre_hashed. This makes the APIs match better.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:
cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
...
cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f
we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.
Don't do anything if the entry's destination is null and the
instruction's destination is non-null.
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Just use the abs source modifier on both of the multiplicand
arguments.
instructions in affected programs: 300 -> 296 (-1.33%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Just use the negation source modifier on one of the multiplicand
arguments.
total instructions in shared programs: 5889529 -> 5880016 (-0.16%)
instructions in affected programs: 600846 -> 591333 (-1.58%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Without the break, it was possible that an instruction would match multiple
expressions. If this happened, you could end up trying to replace it
multiple times and get a segfault. This makes it so that, after a
successful replacement, it moves on to the next instruction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This refactor allows you to more easily get the deref node associated with
a given variable. We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Additional description was added to a variety of places. Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should make debugging a lot easier as GDB handles static inlines much
better than macros. Also, static inlines are typesafe.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, our variable renaming algorithm, while similar to the one in
the Cytron paper, was not the same. While I'm pretty sure it was correct,
it will be easier for readers of the code in the variable renaming pass if
it follows more closely. This commit removes the automatic stack popping
we were doing and replaces it with explicit popping like Cytron does.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit seeks to make the lower_variables pass much more clear by
adding a pile of comments and re-arranging a few things. There are no
functional or algorithmic changes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
parallel_copy_copy was a silly name. Also, things were getting long and
annoying, so I added a foreach macro. For historical reasons, several of
the original iterations over parallel copy entries in from_ssa used the
_safe variants of the loop. However, all of these no longer ever remove an
entry so it's ok to make them all use the normal iterator.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were doing a lazy creation of the parallel copy
instructions. This is confusing, hard to get right, and involves some
extra state tracking of the copies. This commit adds an extra walk over
the basic blocks to add the block-end parallel copies up front. This
should be much less confusing and, consequently, easier to get right. This
commit also adds more comments about parallel copies to help explain what
all is going on.
As a consequence of these changes, we can now remove the at_end parameter
from nir_parallel_copy_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were emitting the full pile of setup instructions for sample_id
and sample_pos every time they were used. With this commit, we emit them
in their own pass once at the beginning of the shader and simply emit uses
later on. When it comes time for setting up VS, we can put setup for its
special values in the same pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Originally, this field was intended for determining if the given
instruction acted per-component or if it had mismatching source and
destination sizes that would have to be interpreted specially. However, we
can easily derive this from output_size == 0, so it's not really that
useful. Also, the values we were setting in nir_opcodes.h for this field
were completely bogus and it was never used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Prior to this commit, we had a big switch statement for this. Now it's
baked into the opcode metadata so we can just use that.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit adds some algebraic properties to the metadata of each opcode
in NIR. In particular, you now know, just from the metadata, if a given
opcode is commutative or associative. This will be useful for algebraic
transformation passes that want to be able to match a + b as well as b + a
in one go.
v2: Make algebraic properties all caps. This was more consistent with the
intrinsics flags and seems better for flags in general.
Also, the enums are now declared with (1 << n) rather then hex values.
v3: fmin and fmax technically aren't commutative or associative. Things
get funny when one of the arguments is a NaN.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, lower_atomics was non-SSA only. We assert-failed if the
destination of an atomic operation intrinsic was an SSA def and we used
temporary registers for computing offsets. This commit changes both of
these behaviors. We now use SSA values for computing offsets (so we can
optimize them) and we handle SSA destinations. We also move the pass to
run before we go out of SSA on i965 as it now generates SSA values.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were using foreach_dest and switching on whether the destination
was an SSA value. This works, except not all destinations are SSA values
so we have to special-case ssa_undef instructions. Now that we have a
foreach_ssa_def function, we can iterate over all of the register
destinations in one pass and iterate over the SSA destinations in a second.
This way, if we add other ssa-only instructions, we won't have to worry
about adding them to the special case we have for ssa_undef.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There are some functions whose destinations are SSA-only and so aren't a
nir_dest. This provides a function that is capable of iterating over the
SSA definitions defined by those functions. If you want registers, you
should use the old iterator.
v2: Kenneth Graunke <kenneth@whitecape.org>:
- Fix nir_foreach_ssa_def's return value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were just iterating over the program "in order" which
kind-of approximates a DFS, but not really. In particular, we got the
following case wrong:
loop {
a = 3;
if (foo) {
a = 5;
} else {
break;
}
use(a);
}
where use(a) would get 3 instead of 5 because of premature popping of the
SSA def stack. Now, since we do an actaul DFS, we should evaluate use(a)
immediately after a = 5 and we should be ok.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the condition was a scalar that applied to all components
simultaneously. As of this commit, the condition is a vector and each
component is switched seperately.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
nir_metadata_dirty was a terrible name because the parameter it takes is
the metadata to be preserved. This is really confusing because it looks
like it's doing the opposite of what it is actually doing. Now it's named
sensibly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In GLSL-to-NIR we were just setting the base index to 0 whenever there was
an indirect so having it expressed as a sum makes no sense. Also, while a
base offset may make sense for the memory location (first element in the
array, etc.) it makes less sense for the actual uniform buffer index. This
may change later, but it seems to make more sense for now.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit builds on the nir_search.h infastructure by adding a bit of
python code that makes it stupid easy to write an algebraic transformation
pass. The nir_algebraic.py file contains four python classes that
correspond directly to the datastructures in nir_search.c and allow you to
easily generate the C code to represent them. Given a list of
search-and-replace operations, it can then generate a function that applies
those transformations to a shader.
The transformations can be specified manually, or they can be specified
using nested tuples. The nested tuples make a neat little language for
specifying expression trees and search-and-replace operations in a very
readable and easy-to-edit fasion.
The generated code is also fairly efficient. Insteady of blindly calling
nir_replace_instr with every single transformation and on every single
instruction, it uses a switch statement on the instruction opcode to do a
first-order culling and only calls nir_replace_instr if the opcode is known
to match the first opcode in the search expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This framework provides a simple way to do simple search-and-replace
operations on NIR code. The nir_search.h header provides four simple data
structures for representing expressions: nir_value and four subtypes:
nir_variable, nir_constant, and nir_expression. An expression tree can
then be represented by nesting these data structures as needed. The
nir_replace_instr function takes an instruction, an expression, and a
value; if the instruction matches the expression, it is replaced with a new
chain of instructions to generate the given replacement value. The
framework keeps track of swizzles on sources and automatically generates
the currect swizzles for the replacement value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the casting operations were macros. While this is usually
fine, the casting macro used the input parameter twice leading to strange
behavior when you passed the result of another function into it. Since we
know the source and destination types explicitly, we don't loose anything
by making it a function.
Also, this gives us a nice little macro for creating cast function that
will hopefully prevent mistyping.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We used to have the number of components built into the intrinsic. This
meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
variants. This lead to piles of switch statements to generate the correct
intrinsic names, and introspection to figure out the number of components.
We can make things much nicer by allowing "vectorized" intrinsics.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit switches us over to the new variable lowering code which is
capable of properly handling lowering indirects as we go.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
With this commit, the GLSL IR -> NIR pass generates NIR in more-or-less SSA
form. It's SSA in the sense that it doesn't have any registers, but it
isn't really useful SSA because it still has a pile of load/store
intrinsics that we will need to get rid of.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass analizes all of the load/store operations and, when a variable is
never aliased (potentially used by an indirect operation), it is lowered
directly to an SSA value. This pass translates to SSA directly and does
not require any fixup by the original to-SSA pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead, we give SSA definitions a temporary index of 0xFFFFFFFF if the
instruction does not have a block and a proper index when it actually gets
added to the list.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used a string name. It was nice for translating out of GLSL
IR (which also does that) but cumbersome the rest of the time.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Backends want to be able to do special things with constant values such as
put them into immediates or make decisions based on whether or not a value
is constant. Before, constants always got lowered to a load_const into a
register and then a register use. Now we leave constants as SSA values so
backends can special-case them if they want. Since handling constant SSA
values is trivial, this shouldn't be a problem for backends.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is still fairly basic. It only handles ALU operations, constant
loads, and phi nodes. No texture ops or intrinsics yet.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The unlink_blocks function moves successors around to make sure that, if
there is a remaining successor, it is in the first successors slot and not
the second. To fix this, we simply get both successors up front.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We also make the return types match GLSL. The GLSL spec specifies that
findMSB and findLSB return a signed integer. Previously, nir had them
return unsigned. This updates nir's behavior to match what GLSL expects.
We also update the nir-to-fs generator to take the new instructions. While
we're at it, we fix the case where the input to findMSB is zero.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
These indices should now be reasonably stable/consistent. Redoing the
indices in the print functions makes it harder to debug problems.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some time while refactoring things to make it look nicer before pushing to
master, I completely broke the function. This fixes it to be correct.
Just goes to show you why you souldn't push code that has no users yet...
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit rewrites the out-of-SSA pass to not be nearly as naieve. It's
based on "Revisiting Out-of-SSA Translation for Correctness, Code Quality,
and Efficiency" by Boissinot et. al. It should be fairly close to
state-of-the art.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we don't actually have an "if" instruction, this is a very common
pattern when iterating over instructions. This adds a helper function for
it to make things a little less painful.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is kind of stupidly implemented but it should be enough to get us
up and going. We probably want something better that doesn't generate all
of the redundant moves eventually. However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.
Previously, emit_general_interpolation took an ir_variable and pulled the
information it needed from that. This meant that in fs_fp, we were
constructing a dummy ir_variable just to pass into it. This commit makes
emit_general_interpolation take only the information it needs and gets rid
of the fs_fp cruft.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
whitespace and automake fixes
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Include ralloc and hash_table from the util directory
whitespace fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
This reverts commit 0543630d0b.
It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.
We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.
Sorry I didn't remember this when reviewing the reverted change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
You would not believe the mess GCC 4.8.3 generated for the old
switch-statement.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)
The regression on 32-bit is odd. Callgrind says the caller,
_mesa_is_valid_prim_mode is faster. Before it says 2,293,760
cycles, and after it says 917,504.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Multithread:
32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40)
64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40)
Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on
32-bit or 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous check was insufficient (as it did not take 'indices' into
consideration), and DX10 hardware does not need this check anyway.
Since index_bytes is no longer used, remove it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40)
64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40)
The regression on 64-bit is odd. Callgrind says the caller,
validate_DrawElements_common is faster. Before it says 10,321,920
cycles, and after it says 8,945,664.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't affect performance, but it feels more correct.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: No difference proven at 95.0% confidence (n=120)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of having an extra pointer indirection in one of the hottest
loops in the driver.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
v2 (Ken): Cut size of array from 64 to 57 to save memory.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the switch-statement, GCC 4.8.3 produces a small pile of code with
a branch.
00000000 <brw_get_index_type>:
000000: 8b 54 24 04 mov 0x4(%esp),%edx
000004: b8 01 00 00 00 mov $0x1,%eax
000009: 81 fa 03 14 00 00 cmp $0x1403,%edx
00000f: 74 0d je 00001e <brw_get_index_type+0x1e>
000011: 31 c0 xor %eax,%eax
000013: 81 fa 05 14 00 00 cmp $0x1405,%edx
000019: 0f 94 c0 sete %al
00001c: 01 c0 add %eax,%eax
00001e: c3 ret
However, this could be two instructions.
00000000 <brw_get_index_type>:
000000: 2d 01 14 00 00 sub $0x1401,%eax
000005: d1 e8 shr %eax
000007: 90 nop
000008: 90 nop
000009: 90 nop
00000a: 90 nop
00000b: c3 ret
The function was also moved to the header so that it could be inlined at
the two call sites. Without this, 32-bit also needs to pull the
parameter from the stack. This means there is a push, a call, a move,
and a ret added to a two instruction function. The above code shows the
function with __attribute__((regparm=1)), but even this adds several
extra instructions. There is also an extra instruction on 64-bit to
move the parameter to %eax for the subtract.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
...so that it can be inlined in the two places that call it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were happily printing "Native code for unnamed vertex shader" and
"VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output,
as well as the KHR_debug output used by shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.
shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all. Craziness ensued.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only GNU indent is supported when indenting autogenerated format_pack.c
and format_unpack.c files. Some non-GNU indent (Mac OS X and FreeBSD)
add extra whitespaces than break the build of those files.
Fallback to 'cat' if a non-GNU indent is found.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88335
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The 8888 suggests 8-bit components which is not correct, so
replace that with the actual size of the components in each
format.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before we were always coping from the buffer being mapped into the
temporary buffer. However, if INVALIDATE_RANGE is set, then we know that
the data is going to be junk after we unmap so there's no point in doing
the blit. This is important because doing the blit will cause a stall 3
lines later when we map the buffer.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch enables ES2 extension that utilizes existing ES3 functionality.
Changes make all the subtests to run and pass in WebGL conformance
test 'webgl-draw-buffers' when running Chrome on OpenGL ES, also
Piglit test 'draw_buffers_gles2' passes.
v2: remove unused boolean (Ilia Mirkin)
v3: proper error checking for invalid values (Chad Versace)
v4: run error check explicitly for ES2 and ES3 (Kenneth Graunke)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This will be needed for NIR because it is typeless and treats all constants
as uint32 values and reinterprets them when they are used later. This
commit allows those values to be properly propagated.
Also, this helps some synmark shaders because it allows us to copy
propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us
combine 4 load_payloads.
instructions in affected programs: 2288 -> 2144 (-6.29%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Removes commit 7894278 changes and moves fix to _mesa_GetInternalformativ().
The original commit enabled the GL_RGB and GL_RGBA unsized internal formats
as valid for render buffers in GLES3, but this is incorrect. They should
have only been enabled for GetInternalformativ()
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88079
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components. The hardware can't write in.xy and in.w to discontiguous
registers. To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.
This fixes a problem noticed with firefox OMTC.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
According to the OpenGL and OpenGL ES specs (sections
"FRAMEBUFFER COMPLETENESS" and "Whole Framebuffer Completeness"),
the image for color, depth or stencil attachments must be renderable,
otherwise the attachment is considered incomplete and we should report
GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT. Currently, we detect this
situation properly but report a different error.
This fixes the following 3 piglit tests:
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgba_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb16f
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From GLES3 specification (page 123), "The currently bound sampler may be
queried by calling GetIntegerv with pname set to
SAMPLER_BINDINGGL_SAMPLER_BINDING".
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getboolean
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger64
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 2 dEQP tests:
dEQP-GLES3.functional.state_query.sampler.sampler_texture_min_lod_getsamplerparameteri
dEQP-GLES3.functional.state_query.sampler.sampler_texture_max_lod_getsamplerparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 8 dEQP tests:
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_max_lod_gettexparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Return the proper value for two-dimensional array texture and three-dimensional
textures.
From OpenGL ES 3.0 spec, chapter 6.1.13 "Framebuffer Object Queries",
page 234:
"If pname is FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER and the texture
object named FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is a layer of a
three-dimensional texture or a two-dimensional array texture, then params
will contain the number of the texture layer which contains the attached im-
age. Otherwise params will contain the value zero."
Furthermore, FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER is an alias of
FRAMEBUFFER_ATTACHMENT_TEXTURE_3D_ZOFFSET_EXT.
This patch fixes dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_texture_layer
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 0ae9ca12a8 put source modifiers out of the bitcast operations
by adding a MOV operation that would handle them separately. It missed
the case of ceil though: the implementation negates both its source and
destination operands. The source operand will be used for RNDD, which
we can handle normally, but we need to fix the modifier for the
negated result.
v2:
- RNDD can handle the source modifier so no need to put that one
in a separate MOV.
Fixes the following 42 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_vertex
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_fragment
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*vertex.*
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*fragment.*
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
"9.4. FRAMEBUFFER COMPLETENESS
...
Depth and stencil attachments, if present, are the same image."
Notice that this restriction is not included in the OpenGL ES2 spec.
Fixes 18 dEQP tests in:
dEQP-GLES3.functional.fbo.completeness.attachment_combinations.*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
'4.1.4 Stencil Test' section of the GL-ES 3.0 specification says:
"In the initial state, [...] the front and back stencil mask are both set
to the value 2^s − 1, where s is greater than or equal to the number of
bits in the deepest stencil buffer* supported by the GL implementation."
Since the maximum supported precision for stencil buffers is 8 bits, mask
values should be initialized to 2^8 - 1 = 0xFF.
Currently, these masks are initialized to max unsigned integer (~0u), because
in OpenGL 3.0 and before, the initial mask values were:
"In the initial state, stenciling is disabled, the front and back
stencil reference value are both zero, the front and back stencil
comparison functions are both ALWAYS, and the front and back
stencil mask are both all ones."
The problem is that it causes the mask values to overflow to -1 when converted
to signed integer by glGet* APIs.
Fixes 6 dEQP failing tests:
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_both_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_both_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The range's min and max, and the precision value are not set correctly for the
vertex shader constants.
Fixes 1 dEQP test: dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Check returned texObj is not null. If texObj is null there is already
GL_INVALID_OPERATION error set.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
GL_UNSIGNED_SHORT_5_5_5_1, GL_UNSIGNED_SHORT_1_5_5_5_REV,
GL_UNSIGNED_INT_10_10_10_2, GL_UNSIGNED_INT_2_10_10_10_REV data types
are not explicitly allowed to work with GL_ABGR_EXT format neither
in GL nor GL_EXT_abgr specs.
Removed the corresponding mesa formats as there are no other functions
using them inside Mesa anymore.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_pack_rgba_span_float was the last of the color span functions
and we have replaced all calls to it with calls to _mesa_format_convert,
so we can remove it together with tmp_pack.h which was used to
generate the pack functions for multiple types that were used from
the various color span functions that have been removed.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
This was only used to create temporary RGBA float images in the process
of storing some compressed formats. These can call _mesa_texstore
with a RGBA/float dst to achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
texstore_rgba will use the GL_COLOR_INDEX to RGBA conversion
helpers instead and compressed formats that used
_mesa_make_temp_ubyte_image to create an ubyte RGBA temporary
image can call _mesa_texstore with a RGBA/ubyte dst to
achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_unpack_bitmap() was introduced by commit 02b801c to handle the case
when data is stored in PBO by display lists, in the context of this bug:
Incorrect pixels read back if draw bitmap texture through Display list
https://bugs.freedesktop.org/show_bug.cgi?id=10370
Since _mesa_unpack_image() already handles the case of GL_BITMAP, this patch
removes _mesa_unpack_bitmap() and makes affected calls go through
_mesa_unapck_image() instead.
The sample test attached to the original bug report passes with this change
and there are no piglit regressions.
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In the future we would like to have a format conversion library that is
independent of GL so we can share it with Gallium. This is a step in that
direction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of using _mesa_pack_rgba_span_float. This should allow us to remove
that function in a later patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the only place that uses _mesa_unpack_color_span_float so after
this we should be able to remove that function.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Notice that _mesa_format_convert does not handle byte-swapping scenarios,
GL_COLOR_INDEX or MESA_FORMAT_YCBCR(_REV), so these must be handled
separately.
Also, remove all the code that goes unused after using _mesa_format_convert.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We only use _mesa_make_temp_ubyte_image in texstore.c to convert
GL_COLOR_INDEX to RGBA, but this helper does more stuff than this.
All uses of this helper can be replaced with calls to
_mesa_format_convert except for this GL_COLOR_INDEX conversion.
This patch extracts the GL_COLOR_INDEX to RGBA logic to a separate
helper so we can use that instead from texstore.c.
In future patches we will replace all remaining calls to
_mesa_make_temp_ubyte_image in the repository (related to compressed
formats) with calls to _mesa_format_convert so we can remove
_mesa_make_temp_ubyte_image and related functions.
v2:
- Remove ‘for’ loop initial declaration. They are only allowed in C99 or C11
mode.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For glReadPixels with a Luminance destination format we compute luminance
values from RGBA as L=R+G+B. This, however, requires ad-hoc implementation,
since pack/unpack functions or _mesa_swizzle_and_convert won't do this
(and thus, neither will _mesa_format_convert). This patch adds helpers
to do this computation so they can be used to support conversion to luminance
formats.
The current implementation of glReadPixels does this computation as part
of the span functions in pack.c (see _mesa_pack_rgba_span_float), that do
this together with other things like type conversion, etc. We do not want
to use these functions but use _mesa_format_convert instead (later patches
will remove the color span functions), so we need to extract this functionality
as helpers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We have _mesa_swap{2,4} but these do in-place byte-swapping only. The new
functions receive an extra parameter so we can swap bytes on a source
input array and store the results in a (possibly different) destination
array.
This is useful to implement byte-swapping in pixel uploads, since in this
case we need to swap bytes on the src data which is owned by the
application so we can't do an in-place byte swap.
v2:
- Include compiler.h in image.h, which is necessary to build in MSCV as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We had previously added the needed mesa formats, so we can simplify
the code further.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 after review by Jason Ekstrand:
- Move _mesa_format_from_format_and_type to glformats
- Return a mesa_format for GL_UNSIGNED_INT_8_8_8_8(_REV)
v3:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will come in handy when callers of _mesa_format_convert need
to compute the rebase swizzle parameter to use.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The new parameter allows callers to provide a rebase swizzle that
the function needs to use to match the requirements of the base
internal format involved. This is necessary when the source or
destination internal formats (depending on whether we are doing
the conversion for a pixel download or a pixel upload respectively)
do not match the base formats of the source or destination
formats of the conversion. This can happen when the driver does not
support the internal formats and uses a different format to store
pixel data internally.
For example, a texture upload from RGB to Luminance in a driver
that does not support textures with a Luminance format may decide
to store the Luminance data as RGBA. In this case we want to store
the RGBA values as (R,R,R,1). Following the same example, when we
download from that texture to RGBA we want to read (R,0,0,1). The
rebase_swizzle parameter allows these transforms to happen.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is necessary to handle conversions between array types where
the driver does not support the dst format requested by the client and
chooses a different format instead.
We will need this in _mesa_format_convert, so move it to format_utils.c,
prefix it with '_mesa_' and make it available to other files.
v2:
- Move _mesa_compute_component_mapping to glformats
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Iago Toral <itoral@igalia.com>:
- When testing if we can directly pack we should use the src format to check
if we are packing from an RGBA format. The original code used the dst format
for the ubyte case by mistake.
- Fixed incorrect number of bits for dst, it was computed using the src format
instead of the dst format.
- If the dst format is an array format, check if it is signed. We were only
checking this for the case where it was not an array format, but we need
to know this in both scenarios.
- Fixed incorrect swizzle transform for the cases where we convert between
array formats.
- Compute is_signed and bits only once and for the dst format. We were
computing these for the src format too but they were overwritten by the
dst values immediately after.
- Be more careful when selecting the integer path. Specifically, check that
both src and dst are integer types. Checking only one of them should suffice
since OpenGL does not allow conversions between normalized and integer types,
but putting extra care here makes sense and also makes the actual requirements
for this path more clear.
- The format argument for pack functions is the destination format we are
packing to, not the source format (which has to be RGBA).
- Expose RGBA8888_* to other files. These will come in handy when in need to
test if a given array format is RGBA or in need to pass RGBA formats to
mesa_format_convert.
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Add an RGBA8888_INT definition.
v4 by Iago Toral <itoral@igalia.com> after review by Jason Ekstrand:
- Added documentation for _mesa_format_convert.
- Added additional explanatory comments for integer conversions.
- Ensure that we use _messa_swizzle_and_convert for all signed source formats.
- Squashed: do not directly (un)pack to RGBA UINT if the source is not unsigned.
v5 by Iago Toral <itoral@igalia.com>:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Use autogenerated format pack functions and take advantage of some
macros to reduce source code, facilitating its maintenance.
Unfortunately, dstType == GL_UNSIGNED_SHORT cannot simplified like
the others, so keep it as it is.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this in a later patch to refactor _mesa_pack_rgba_span_float.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Take advantage of new mesa formats and new format_pack functions to
reduce source code in _mesa_pack_rgba_span_from_ints() and
_mesa_pack_rgba_span_from_uints().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit adds a macro to facilitate the task of using
format conversions functions but keeps the same API.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to refactor code in pack.c and support conversion
to/from these types in a master convert function that will be added
later.
v2:
- Fix autogeneration of MESA_FORMAT_A2R10G10B10_UNORM pack/unpack
functions
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to unify code in pack.c.
v2:
- Modify pack_int_*() function generator to use c.datatype() and
f.datatype()
v3:
- Only autogenerate pack_int_*() functions for non-normalized integer
formats.
v4:
- Use _mesa_unsigned_to_unsigned() in pack_int_*() because, in order
to be able to pack both signed and unsigned formats, we need to
sign-extend.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this later on to handle uint conversion scenarios in a master
convert function.
v2:
- Modify pack_uint_*() function generation to use c.datatype() and
f.datatype().
- Remove UINT_TO_FLOAT() macro usage from pack_uint*()
- Remove "if not f.is_normalized()" conditional as pack_uint*()
functions are only autogenerated for non normalized formats.
v3:
- Add clamping for non-normalized integer formats in pack_uint*()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Add usage of INDENT_FLAGS in Makefile.am
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Modify unpack_float_*() and unpack_ubyte_*() function generation
to use c.datatype() and f.datatype()
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>:
- format_unpack.c.mako is now format_unpack.py, with the template code
inlined. It now auto-generates format_unpack.c
- Add format_unpack.c to gitignore.
- Simplify Makefile.am change
- Modify SConscript to build format_unpack.c with scons
v5 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were auto-generating it before. The problem was that the autogeneration
tool we were using was called "copy, paste, and edit". Let's use a more
sensible solution.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>
- Remove format_pack.c as it is now autogenerated
- Add usage of INDENT_FLAGS in Makefile.am
- Remove trailing blank line
v3 by Samuel Iglesias <siglesias@igalia.com>
- Merge format_convert.py into format_parser.py
- Adapt pack_*_* function generations
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>
- _get_datatype() is now a helper function
v5 by Samuel Iglesias <siglesias@igalia.com>
- format_pack.c.mako is now format_pack.py, with the template code
inlined. It now auto-generates format_pack.c
- Simplify Makefile.am change.
- Modify SConscript to build format_pack.c with scons.
- Remove run_mako.py
- Add format_pack.c to gitignore
v6 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
- Add non-normalized formats support for ubyte packing functions. Merge
the previously separated patch.
- Add clamping for non-normalized integer formats in pack_ubyte*()
v7 by Samuel Iglesias <siglesias@igalia.com>:
- Add assert to check that sRGB formats are 8-bit size.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It is now a hard dependency because of the autogeneration of
format pack and unpack functions.
Update the documentation to reflect this change.
v2:
- Inline python script in m4 file and use PYTHON2
v3:
- Remove semicolons and quotes and change coding style
- Add Ilia Mirkin suggestion to use Python's split functionality.
- Use AX_CHECK_PYTHON_MAKO_MODULE name.
- Change to MIT license
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If we need the base format for a mesa_array_format we have to find the
matching mesa_format first. This is expensive because it requires
to loop through all existing mesa formats until we find the right match.
We can resolve the base format of an array format directly by looking
at its swizzle information. Also, we can have _mesa_get_format_base_format
accept an uint32_t which can pack either a mesa_format or a mesa_array_format
and resolve the base format for either type. This way clients do not need to
check if they have a mesa_format or a mesa_array_format and call different
functions depending on the case.
Another reason to resolve the base format for array formats directly is that
we don't have matching mesa_format enums for every possible array format, so
for some GL format/type combinations we can produce array formats that don't
have a corresponding mesa format, in which case we would not be able to
find the base format. Example format=GL_RGB, type=GL_UNSIGNED_SHORT. This type
would map to something like MESA_FORMAT_RGB_UNORM16, but we don't have that.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
An array format is a 32-bit integer format identifier that can represent
any format that can be represented as an array of standard GL datatypes.
Whie the MESA_FORMAT enums provide several of these, they don't account for
all of them.
v2 by Iago Toral Quiroga <itoral@igalia.com>:
- Implement mesa_array_format as a plain bitfiled uint32_t type instead of
using a struct inside a union to access the various components packed in
it. This is necessary to support bigendian properly, as pointed out by
Ian.
- Squashed: Make float types normalized
v3 by Iago Toral Quiroga <itoral@igalia.com>:
- Include compiler.h in formats.h, which is necessary to build in MSVC as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix various conversion paths that involved integer data types of different
sizes (uint16_t to uint8_t, int16_t to uint8_t, etc) that were not
being clamped properly.
Also, one of the paths was incorrectly assigning the value 12, instead of 1,
to the constant "one".
v2:
- Create auxiliary clamping functions and use them in all paths that
required clamp because of different source and destination sizes
and signed-unsigned conversions.
v3:
- Create MIN_INT macro and use it.
v4:
- Add _mesa_float_to_[un]signed() and mesa_half_to_[un]signed() auxiliary
functions.
- Add clamp for float-to-integer conversions in _mesa_swizzle_and_convert()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_BaseFormat is a GLenum (unsigned int) so testing if its value is
greater than 0 to detect the cases where _mesa_base_tex_format
returns -1 doesn't work.
Fixing the assertion breaks the arb_texture_view-lifetime-format
piglit test on nouveau, since that test calls
_mesa_base_tex_format with GL_R16F with a context that does not
have ARB_texture_float, so it returns -1 for the BaseFormat, which
was not being caught properly by the ASSERT in init_teximage_fields_ms
until now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were returning incorrect mesa formats for GL_LUMINANCE_ALPHA16I_EXT
and GL_LUMINANCE_ALPHA32I_EXT.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The PACK_565_REV macro is no longer used. It was also extremely confusing
because it's actually a byteswapped 565 not reversed 565.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aparently, the packing/unpacking functions for these formats have differed
from the format description in formats.h. Instead of fixing this, people
simply left a comment saying it was broken. Let's actually fix it for
real.
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Fix comment in formats.h
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch fixes the return of a wrong value when x is lower than
-MAX_INT(src_bits) as the result would not be between [-1.0 1.0].
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Modify snorm_to_float() to avoid doing the division when
x == -MAX_INT(src_bits)
Cc: 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).
Fixes es3conform's blend test.
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too. Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
Turns out this was harmful in code quality:
total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs: 22522 -> 21880 (-2.85%)
This costs us yet another register, which is painful since it means more
programs might fail to compile). However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
This will give the compiler the chance to dead-code eliminate unused VPM
reads. This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:
total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
If you had a conditional assignment of an array or struct (say, from the
if-lowering pass), we'd try doing swizzle_for_size() on the aggregate
type, and it would assertion fail due to vector_elements==0. Instead,
extend emit_block_mov() to handle emitting the conditional operations,
which also means we'll have appropriate writemasks/swizzles on the CMPs
within a struct containing various-sized members.
Fixes 20 testcases in es3conform on vc4.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
TextureSubImage when target=GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
GetTextureImage when the target is GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In implementing ARB_DIRECT_STATE_ACCESS functions, it is often necessary to
abstract the functionality of a traditional GL API function into a backend
that both the traditional and dsa API functions can share. For instance,
glTexParameteri and glTextureParameteri both call _mesa_texture_parameteri,
which takes a context object and a texture object as arguments.
The existance of such backend functions provides the opportunity for
driver internals (such as meta) to pass around the actual texture object
rather than its ID or target, saving on texture object storage and look-up
overhead.
This patch provides nameless texture creation and deletion for meta. This
will be used in an upcoming refactor of meta.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, certain error handling has
changed. One example shown here is that INVALID_ENUM is thrown instead of
INVALID_OPERATION when a user attempts to set sampler parameters for a
multisample target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, some error handling has
changed (see OpenGL 4.5 core spec, 30.10.2014, Section 8.10 Texture
Parameters, pages 228-29). As an example, changing sampler states with a
multisample target throws INVALID_ENUM rather than INVALID_OPERATION.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The following preparations were made in texstate.c and texstate.h to
better facilitate the BindTextureUnit function:
Dylan Noblesmith:
mesa: add _mesa_get_tex_unit()
mesa: factor out _mesa_max_tex_unit()
This is about to appear in a lot more places, so
reduce boilerplate copy paste.
add _mesa_get_tex_unit_err() checking getter function
Reduce boilerplate across files.
Laura Ekstrand:
Made note of why BindTextureUnit should throw GL_INVALID_OPERATION if the unit is out of range.
Added assert(unit > 0) to _mesa_get_tex_unit.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In order to implement ARB_DIRECT_STATE_ACCESS, many GL API functions must now
rely on a backend that both traditional and DSA functions can use. For
instance, _mesa_TexStorage2D and _mesa_TextureStorage2D both call a backend
function _mesa_texture_storage that takes a context and a texture object as
arguments. The backend is named _mesa_texture_storage so that Meta can call
it and avoid looking up the context and the texture object. However, backend
names often look very close to the names of software fallbacks (ie.
_mesa_alloc_texture_storage). For this reason, software fallbacks have been
renamed for clarity to have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Most ARB_DIRECT_STATE_ACCESS functions take an object's ID and use it to look
up the object in its hash table. If the user passes a fake object ID (ie. a
non-generated name), the implementation should throw INVALID_OPERATION.
This is a convenience function for texture objects.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We never used ulVersion for proper version checks.
Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver
started using a different version number, so the handy trick of renaming
Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for
quick testing of Mesa software renderers stopped working.
Reviewed-by: Brian Paul <brianp@vmware.com>
SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2
depending on bit 22 in the message header. If the bit is 0 or there is
no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull
constants, so use a message header in those cases and set bit 22 there.
Based on an initial patch from Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We can't (or don't know how to) turn this off. But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.
Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline. Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.
NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node. Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask. So for this case use a simple size
field rather than a bitmask.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We probably could be more clever elsewhere and mask out components that
are not used. But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Old compiler doesn't have ir3_block's.. so we need a special path. This
hack can be dropped when ir3_compiler_old is retired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them. So we implicitly assume that ArrayID==0 always
exists for each file. This is why array_max[file] is never less than
zero.
You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp. Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object. This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=88170
It doesn't work on Windows because of STDCALL calling convention -- it's
the callee responsibility to pop the arguments, and the number of
arguments vary with the prototype --, so the stack pointer ends up getting
corrupted.
This is just a non-invasive stop-gap fix. A proper fix would be more
elaborate, and require either:
- a variation of __glapi_noop_table which sets GL_INVALID_OPERATION
error
- stop using APIENTRY on all internal _mesa_* functions.
Tested with piglit gl-1.0-beginend-coverage (it now fails instead of
crashing).
VMware PR1350505
Reviewed-by: Brian Paul <brianp@vmware.com>
To catch mismatches in cdecl vs stdcall calling convention. See code
comment for more detailed explanation.
Tested with piglit gl-1.0-beginend-coverage (it now also crashes on
debug builds.)
VMware PR1350505.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- we don't usually need to flush TC L2
- we should flush KCACHE
(not really an issue now since we always flush KCACHE when updating
descriptors, but it could be a problem if we used CE, which doesn't
require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
So that TC L2 doesn't need to be flushed.
The only problem is with index buffers, which don't use TC.
A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA
when updating the same memory.
The solution for SI is to use uncached access here, because CP DMA doesn't
support cached access.
CIK will be handled in the next patch.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Only done for completeness. Not used by anything yet.
Tested by advertising PIPE_CAP_VERTEXID_NOBASE.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).
That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- the relocs array is unused, remove it
- ndw is at most 115 (init), set 140 as the maximum
- compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
From GL 4.4 Core profile:
If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not taking a
type parameter, including all of the *Draw* commands other than *DrawEle-
ments*.
Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Instead of telling the driver that the window system ancillaries have
been invalidated (when the driver doesn't know which of its buffers
are the window system's!), introduce a method for invalidating
specific surfaces.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of the EGL spec, and is useful for a tiled renderer to avoid
the memory bandwidth cost of storing the depth/stencil buffers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Merge the following upstream autoconf-archive patches.
ax_prog_flex: change grep syntax to accept e.g. "flex.real" in case a wrapper or symlink is used.
AX_PROG_FLEX: avoid use of grep empty string escape extension (fix for OpenBSD)
AX_PROG_FLEX: Also accept gflex.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@openbsd.org>
Rather than building a new one every compile. This should reduce some
of the overhead of compiling shaders.
One consequence of this change is that we lose the MachineInstrs dumps
when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is
still dumped, and if you still want to see the MachineInstr dump, you
can run the dumped LLVM IR through llc.
The "normal" detection (querying clflush size) already made sure it is
non-zero, however another method did not. This lead to crashes if this
value happened to be zero (apparently can happen in virtualized environments
at least).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this
does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function
decoration to fix the segfault which can happen if stack alignment is only
4 bytes.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The headers hadn't been regenerated in a long time and had seen a number
of manual modifications. A few changes:
- remove nvc0_2d entirely, use the nv50 header which has the nvc0
values too
- remove 3ddefs, it's identical to the nv50 file
- move macros out into a separate file
Also the upstream rnndb changed the overall chip naming convention; this
was fixed up manually in the generated files until a better solution is
determined.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The headers hadn't been regenerated in a long time, and there were a few
minor divergences. Among other things, rnndb has changed naming to
G80/etc, for now I've not tackled switching that over and manually
replaced the nvidia codenames back to the chip ids. However no other
modifications of the headergen'd headers was done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compression seems to be supported for only some formats. Enable it for
those. Previously this was disabled for everything despite the code
looking like it was actually enabled.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
assert is compiled out in release builds - don't put logic into it. Note
that this particular instance is only used for vp debugging and is
normally compiled out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_swizzle_to_scs has been showing up in my CPU profiling, which is
rather silly - it's a tiny amount of code. It really should be inlined,
and can easily be implemented with fewer instructions.
The enum translation is as follows:
SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE
0 1 2 3 4 5
4 5 6 7 0 1
SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE
which is simply (swizzle + 4) & 7.
Haswell needs extra textureGather workarounds to remap GREEN to BLUE,
but Broadwell and later do not.
This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and
gen8_surface_state.c, since the Gen8+ code can be simplified to a mere
two instructions. Both copies can be marked static for easy inlining.
v2: Put the commit message in the code as comments (requested by
Jason Ekstrand). Also fix a typo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Valve games use GL_SRGB8 textures. Instead of supporting that properly,
we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which
meant that we had to use texture swizzling to override the alpha to 1.0
when sampling. This meant shader recompiles on Gen < 7.5 platforms.
By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0
for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All
generations of hardware have supported the format for sampling and
filtering; we can easily support rendering by using the R8G8B8A8_SRGB
format and writing garbage to the X channel. (We do this already for
the non-SRGB version of this format.)
This removes all remaining shader recompiles in a time demo of "Counter
Strike: Global Offensive" (32 -> 0) on Sandybridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format
for source surfaces, and brw->render_target_format[] for destination
surfaces. We should do the same in the sRGB MSAA overrides.
Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA.
The next commit will introduce RGBX SRGB MSAA buffers, at which point
we need to get the RGBX -> RGBA format overrides for rendering right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
ir_to_mesa does this - apparently we just forgot or something.
Without this, we'll guess the wrong texture swizzle (XYZW for color
instead of XXX1 for depth) when doing precompiles.
This cuts 26 shader recompiles in a time demo of "Counter Strike:
Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0
recompiles.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.
We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.
v2: Get the generation check right (caught by Chris Wilson),
combine the ++ with the check (suggested by Daniel Vetter).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are too many state flags to fit in one terminal screen, even with
a very tall terminal. Everything is flagged once, so a value of 1 means
that it hasn't ever happened again, and thus isn't terribly interesting.
Skipping those makes it easier to see the interesting values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a partial revert of c89306983c.
It split the {start,base}_vertex_location handling into several steps:
1. Set brw->draw.start_vertex_location = prim[i].start
and brw->draw.base_vertex_location = prim[i].basevertex.
(This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
appropriately. (This happened in brw_prepare_shader_draw_parameters,
which was called just after brw_prepare_vertices, as part of state
upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).
If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.
The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias. The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all. Hence the awkward split.
However, I believe that including brw->vb.start_vertex_bias was a
mistake. It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].
>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
parameter to the command that resulted in the current shader
invocation. In the case where the command has no <baseVertex>
parameter, the value of <gl_BaseVertexARB> is zero."
I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.
With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex. We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to. This is much simpler, and should actually fix two bugs.
Fixes missing geometry in Unvanquished.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Conditionalize it on having done any uploads (Turns out
u_upload_destroy() isn't safe with a NULL arg).
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
arb_draw_indirect-vertexid elements
gl-3.2-basevertex-vertexid
Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs: 33140 -> 32745 (-1.19%)
From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:
total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs: 123444 -> 122110 (-1.08%)
with no programs hurt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs: 155923 -> 154984 (-0.60%)
Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
After CSEing some MOV ..., VF instructions we have code like
mov tmp, [1F, 2F, 3F, 4F]VF
mov r10, tmp
mov r11, tmp
...
use r10
use r11
We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Port of commit a28ad9d4 from the fs backend.
No shader-db changes since we don't emit MOV ..., VF instructions yet.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently only handles consecutive instructions with the same
destination that collectively write all channels.
total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs: 465236 -> 454449 (-2.32%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.
I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
As of 229bf4475f we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out. However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
Our ability to perform register writes depends on the hardware and
kernel version. It shouldn't ever change on a per-context basis,
so we only need to check once.
Checking introduces a synchronization point between the CPU and GPU:
even though we submit very few GPU commands, the GPU might be busy doing
other work, which could cause us to stall for a while.
On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context
creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine
Valley running in the background (to keep the GPU busy), it improves
performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* This is the cleaned up work of the Haiku GCI student
Adrián Arroyo Calle adrian.arroyocalle@gmail.com
* Several patches were consolidated to prevent
unnecessary touching of non-related code
This patch reduces the likelihood of pointer arithmetic overflow bugs in
gather_oa_results(), like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I get nervous when I see code patterns like this:
(void*) + (int) * (int)
I smell 32-bit overflow all over this code.
This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any
potential overflow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch reduces the likelihood of pointer arithmetic overflow bugs in
intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow
bug in a line of code that looks very similar to pointer arithmetic in
this function.
This patch conceptually applies the same fix as in b69c7c5dac. Instead
of retyping the variables, though, this patch adds some casts. (I tried
to retype the variables as ptrdiff_t, but it quickly got very messy. The
casts are cleaner).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch should diminish the likelihood of pointer arithmetic overflow
bugs, like the one fixed by b69c7c5dac.
Change the type of parameter 'out_stride' from int to ptrdiff_t. The
logic is that if you call intel_miptree_map() and use the value of
'out_stride', then you must be doing pointer arithmetic on 'out_ptr'.
Using ptrdiff_t instead of int should make a little bit harder to hit
overflow bugs.
As a side-effect, some function-scope variables needed to be retyped to
avoid compilation errors.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a pointer points to raw, untyped memory and is never dereferenced,
then declare it as 'void*' instead of casting it to 'void*'.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
trans_kill() only handles the single opcode. Drop the remnant of a time
when both KILL and KILL_IF were handled by the same fxn.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Standalone compiler doesn't have screen or context. We need to come up
with a better way to control the target arch (ie. something that we can
control from cmdline w/ standalone compiler) but for now this hack keeps
it from segfault'ing.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.
Improves glxgears framerate on RPi by around 30%.
This gets DRI3 working on modesetting with glamor. It's not enabled under
simulation, because it looks like handing our dumb-allocated buffers off
to the server doesn't actually work for the server's rendering.
This reverts db3dfcfe90.
The commit was correct but we've got some precision problems later in
llvmpipe (or possibly in draw clip) due to the vertices coming in in
different order, causing some internal test failures. So revert for now.
(Will only affect drivers which actually support constant-interpolated
attributes and not just flatshading.)
The blitter will start at a pixel's natural alignment. For PBOs, if the
provided offset if not aligned, bits will get dropped.
This change adds offset alignment check for src and dst, kicking back if
the requirements are not met.
The change is based on following verbiage from BSPEC:
Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp).
All pixels are naturally aligned.
Found in the following locations:
page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf
page 29 of ivb_ihd_os_vol1_part4.pdf
page 29 of snb_ihd_os_vol1_part5.pdf
This behavior was observed with Steam Big Picture rendering incorrect
icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908
Reviewed-by: Neil Roberts <neil@linux.intel.com>
C linkage was removed from functions in program/sampler.cpp. However,
some cpp files include program/sampler.h within extern "C" blocks,
causing link errors for test_vec4_copy_propagation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Chris Wilson noted that repeated calls to CheckQuery() would call
drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation,
which is expensive. Once we've flushed, we know that future batches
won't reference query->bo, so there's no point in asking more than once.
This patch adds a brw_query_object::flushed flag, which is a
conservative estimate of whether the batch has been flushed.
On the first call to CheckQuery() or WaitQuery(), we check if the
batch references query->bo. If not, it must have been flushed for
some reason (such as being full). We record that it was flushed.
If it does reference query->bo, we explicitly flush, and record that
we did so.
Any subsequent checks will simply see that query->flushed is set,
and skip the drm_intel_bo_references() call.
Inspired by a patch from Chris Wilson.
According to Eero, this does not affect the performance of Witcher 2
on Haswell, but approximately halves the userspace CPU usage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is less code and also measures the duration of the stall for us.
Our old code predates the existance of brw_bo_map().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CheckQuery calls drm_intel_bo_references to see if the batch references
the query BO, and if so, flushes. It then checks if the query BO is
busy, and if not, calls gen6_queryobj_get_results().
Stupidly, gen6_queryobj_get_results() immediately did a second redundant
drm_intel_bo_references check, even though we know the buffer is not
referenced and in fact idle.
This patch moves the batch-flush check out of gen6_queryobj_get_results
and into WaitQuery() (the other caller). That way, both callers do a
single batch-flush check.
This should only be a minor improvement, since it would only affect
the first CheckQuery call where the result is actually available.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If query->bo == NULL, this is a redundant CheckQuery call, and we
should simply return. We didn't do anything anyway - we skipped the
batch flushing block, and although we called get_results(), it has an
early return and does nothing. Why bother?
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q->Ready means that the results are in, and core Mesa is free to return
them to the application. gen6_queryobj_get_results() is a natural place
to set that flag; doing so means callers don't have to.
The older non-hardware-context aware code couldn't do this, because we
had to call brw_queryobj_get_results() to gather intermediate results
when we ran out of space for snapshots in the query buffer. We only
gather complete results in the Gen6+ code, however.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On the same go remove src/mapi/shared-glapi/tests/.gitignore
and src/mapi/glapi/tests/.gitignore as useless.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes 4 vertexid related piglit tests with llvmpipe due to switching
behavior of vertexid to the one gl expects.
(Won't fix non-llvm draw path since we don't get the basevertex currently.)
Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not
supporting vertex ids with base vertex offset applied (so, only support
d3d10-style vertex ids) will get such a d3d10-style vertex id instead -
with the caveat they'll also need to handle the basevertex system value
too (this follows what core mesa already does).
Additionally, this is also useful for other state trackers (for instance
llvmpipe / draw right now implement the d3d10 behavior on purpose, but
with different semantics it can just do both).
Doesn't do anything yet.
And fix up the docs wrt similar values.
v2: incorporate feedback from Brian and others, better names, better docs.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
r600, rv610 and rv630 all have a bug in their GPR indexing
and how the hw inserts access to PV.
If the base index for the src is the same as the dst gpr
in a previous group, then it will use PV instead of using
the indexed gpr correctly.
The workaround is to insert a NOP when you detect this.
v2: add second part of fix detecting DST rel writes followed
by same src base index reads.
v3: forget adding stuff to structs, just iterate over the
previous node group again, makes it more obvious.
v3.1: drop local_nop.
Fixes ~200 piglit regressions on rv635 since SB was introduced.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were assuming, when constructing a new brw_reg struct, that the
negate and abs register modifiers would not be present by default in
the new register.
Now, we force explicitly setting these values when constructing a new
register.
This will avoid problems like forgetting to properly set them when we
are using a previous register to generate this new register, as it was
happening in the dFdx and dFdy generation functions.
Fixes piglit test shaders/glsl-deriv-varyings
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the hash_table API required the user to do all of the hashing
of keys as it passed them in. Since the hashing function is intrinsically
tied to the comparison function, it makes sense for the hash table to know
about it. Also, it makes for a somewhat clumsy API as the user is
constantly calling hashing functions many of which have long names. This
is especially bad when the standard call looks something like
_mesa_hash_table_insert(ht, _mesa_pointer_hash(key), key, data);
In the above case, there is no reason why the hash table shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash table will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
v2: change to call the old entrypoint "pre_hashed" rather than
"with_hash", like cworth's equivalent change upstream (change by
anholt, acked-in-general by Jason).
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
glXSwapBuffersMscOML() with target_msc=divisor=remainder=0 gets
translated into target_msc=divisor=0 but remainder=1 by the mesa
api. This is done for server DRI2 where there needs to be a way
to tell the server-side DRI2ScheduleSwap implementation if a call
to glXSwapBuffers() or glXSwapBuffersMscOML(dpy,window,0,0,0) was
done. remainder = 1 was (ab)used as a flag to tell the server to
select proper semantic. The DRI3/Present backend ignored this
signalling, treated any target_msc=0 as glXSwapBuffers() request,
and called xcb_present_pixmap with invalid divisor=0, remainder=1
combo. The present extension responded kindly to this with a
BadValue error and dropped the request, but mesa's DRI3/Present
backend doesn't check for error codes. From there on stuff went
downhill quickly for the calling OpenGL client...
This patch fixes the problem.
v2: Change comments to be more clear, with reference to
relevant spec, as suggested by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Restores proper immediate tearing swap behaviour for
OpenGL bufferswap under DRI3/Present.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
v2: Add Frank Binns signed off by for his original earlier
patch from April 2014, which is identical to this one, and
Chris Wilsons reviewed tag from May 2014 for that patch, ergo
also for this one.
v3: Incorporate comment about triple buffering as suggested
by Axel Davy, and reference to relevant spec provided by
Eric Anholt.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevent calls to glXGetSyncValuesOML() and glXWaitForMscOML()
from overwriting the (ust,msc) values of the last successfull
swapbuffers call (PresentPixmapCompleteNotify event), as
glXWaitForSbcOML() relies on those values corresponding to
the most recent completed swap, not to whatever was last
returned from the server.
Problematic call sequence without this patch would have been, e.g.,
glXSwapBuffers()
... wait ...
swap completes -> PresentPixmapComplete event -> (ust,msc)
updated to reflect swap completion time and count.
... wait for at least 1 video refresh cycle/vblank increment.
glXGetSyncValuesOML()
-> PresentNotifyMsc event overwrites (ust,msc) of swap
completion with (ust,msc) of most recent vblank
glXWaitForSbcOML()
-> Returns sbc of last completed swap but (ust,msc) of last
completed vblank, not of last completed swap.
-> Client is confused.
Do this by tracking a separate set of (ust, msc) for the
dri3_wait_for_msc() call than for the dri3_wait_for_sbc()
call.
This makes the glXWaitForSbcOML() call robust again and restores
consistent behaviour with the DRI2 implementation.
Fixes applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Rename vblank_msc/ust to notify_msc/ust as suggested by
Axel Davy for better clarity.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
targetSBC == 0 is a special case, which asks the function
to block until all pending OpenGL bufferswap requests have
completed.
Currently the function just falls through for targetSBC == 0,
returning bogus results.
This breaks applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Simplify as suggested by Axel Davy. Add comments proposed
by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
A bunch of open-coded 'gpu_id > 300's seems like it will eventually
cause problems with future generations. There were already a few minor
problems with caps for features that still need additional work on a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rather than duplicating this everywhere. Especially as on a4xx the
layout of layers and levels differs based on texture type.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This code is complete nonsense and has apparently existed since I first
implemented register spilling in the VS two years ago.
Scratch reads are SEND messages, which ignore the destination writemask.
The comment about "data that may not have been written to scratch" is
also confusing - we always spill whole 4x2 registers, so such data
simply does not exist. We can safely ignore the writemask.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code has been turned off for the last
decade. Considering 3Dnow is obsolete it
seems the bug will never be fixed so just
remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
f0ba7d897d made debug_assert()/assert()
unsafe for expressions, but only now that u_atomic.h started to rely on
them for Windows that this became an issue.
This fixes non-debug builds with MSVC.
Reviewed-by: Brian Paul <brianp@vmware.com>
We support MOCS on both gen8 and gen9, so the message seems meaningless. Remove
it to avoid confusion.
Trivial.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The odds of having this patch make a difference on Gen8+ are probably very low.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-but-not-tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Because all topologies are reduced to basic primitives (i.e. no strips, fans)
and the vertices involved are all copied, there's no need for any elaborate
decisions where to insert the prim id. The logic employed was correct for
first provoking vertex, but didn't account at all for the last provoking
vertex case. And since we now will get the right constant value even if the
primitive type is later changed (for unfilled etc.) this is no longer
required to pass certain tests (which were checking for prim_id == some
const interpolated value so passing because both were wrong in the end).
This is a bit overkill (3x4 values assigned in total even though it's really
one scalar per prim...) but the code is now much easier and I don't need to
add more cases for last provoking vertex.
This fixes piglit primitive-id-no-gs-strip test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the first provoking vertex convention would only be used if
flatshading were enabled. No matter how I look at it that cannot be possibly
correct. Maybe the code getting used was somewhat simpler that way at a time
where there weren't constant interpolated attributes, only flatshading...
(Note that all other places including the decomposition macros already do
the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This stage only worked for traditional old-school flatshading, it did ignore
constant interpolated values and only handled colors, the code probably
predates using of constant interpolated values in gallium. So fix this - the
clip stage apparently did this a long time ago already.
Unfortunately this also means the stage needs to be invoked when flatshading
isn't enabled but some other prim changing stages are - for instance with
fill mode line each of the 3 lines in a tri should get the same attribute
value from the leading vertex in the original tri if interpolation is constant,
which did not happen before
Due to that, the stage is now run in more cases, even unnecessary ones. Could
in theory skip it completely if there aren't any constant interpolated
attributes (and rast->flatshade isn't set), but not sure it's worth bothering,
as it looks kinda complicated getting this information in advance.
No piglit change (doesn't really cover this directly).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just like we do for tris (det shouldn't matter at this point, however
can have flags for things like line stipple reset).
No piglit change, it would fail line stippling tests if the flatshade
stage were run, which will happen with the next commit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The previous language was a bit misleading, since it sounded like
w was interpolated then the reciprocal calculated which isn't what
should be happening.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+. We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.
No piglit regressions.
Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise. Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These structs aren't vec4 specific, they are shared by shader stages
operating on Vertex URB Entries (VUEs). VUEs are the data structures in
the URB that hold vertex data between the pipeline geometry stages.
Using vue in the name instead of vec4 makes a lot more sense, especially
when we add scalar vertex shader support.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar vertex shader will use the ATTR register file for vertex
attributes. This patch adds support for the ATTR file to fs_visitor.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This chunk of code is repeated in a few places, and we're going to add
a MESA_SHADER_VERTEX case to it soon.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This flag signals that we have a SIMD8 VS shader so we can set up the
corresponding state accordingly. This boils down to setting
the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull
constant buffers use dword pitch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is all we need from the generator for SIMD8 vertex shaders. This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the caller passes in the shader debug name, we don't need this
anymore.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This removes all stage specific data from the generator, and lets us
create a generator for any stage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't propagate the saturate bit and some instructions can't
saturate at all. If the source has saturate set, just skip propagation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Back to the original commit (8313f444) adding the workaround, we were
enabling it on gens <= 7, even though gens <= 5 can't do multisampling.
I cannot find documentation that says that Sandybridge needs this
workaround but in practice disabling it causes these piglit tests to
fail:
EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled}
On Ironlake:
total instructions in shared programs: 4358478 -> 4349671 (-0.20%)
instructions in affected programs: 117680 -> 108873 (-7.48%)
A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This way we get a warning if an enum value is not handled.
v2: codestyle
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On evergreen there are 4 regs, on r600/700 there is only one.
Don't initialise regs and trash someone elses state.
Not sure this fixes anything, but hey one less stupid.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means another pass of reordering the uniform data store, but it lets
us pair up a lot more instructions.
total instructions in shared programs: 44639 -> 43176 (-3.28%)
instructions in affected programs: 36938 -> 35475 (-3.96%)
This is a standard scheduling heuristic, and clearly helps.
total instructions in shared programs: 46418 -> 44467 (-4.20%)
instructions in affected programs: 42531 -> 40580 (-4.59%)
Collapse things back into a setup_slices() which takes the desired
alignment as a param. This gets things ready for a4xx which has some
slightly different requirements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits, but that is an assumption OpenGL drivers (or any dynamic library for that matter) can't afford to make as there are many closed- and open- source application binaries out there that only assume 4-byte stack alignment.
V4: fix comment and indentation
V3: move all sse4.1 build flag config to the same location
and add comment as to why we need to do the realign
V2: use $target_cpu rather than $host_cpu
and setup build flags in config rather than makefile
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
CC: "10.4" <mesa-stable@lists.freedesktop.org>
For OpenGL ES 3.0 spec, the minor number for SHADING_LANGUAGE_VERSION is always
two digits, matching the OpenGL ES Shading Language Specification release
number. For example, this query might return the string "3.00".
This patch fixes the following dEQP test:
dEQP-GLES3.functional.state_query.string.shading_language_version
No piglit regression observed.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL ES 3.00 spec, chapter 4.6.1 "The Invariant Qualifier",
Only variables output from a shader can be candidates for invariance. This
includes user-defined output variables and the built-in output variables.
As only outputs can be declared as invariant, an invariant output from one
shader stage will still match an input of a subsequent stage without the
input being declared as invariant.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_invariant_input
No piglit regressions observed.
v2:
- Add spec content in the code
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The current code computes ctx->Array.LegalTypesMask just once,
however, computing this needs to consider ctx->API so we need
to make sure that the API for that context has not changed if
we intend to reuse the result.
The context API can change, at least, if we go through
_mesa_meta_begin, since that will always force
API_OPENGL_COMPAT until we call _mesa_meta_end. If any
operation in between these two calls triggers a call to
update_array_format, then we might be caching a value for
LegalTypesMask that will not be right once we have called
_mesa_meta_end and restored the context API.
Fixes the following 179 dEQP tests in i965:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.multiple_attributes.input_types.3_*fixed2*
dEQP-GLES3.functional.draw.random.{2,18,28,68,83,106,109,156,181,191}
Reviewed-by: Brian Paul <brianp@vmware.com>
From GL ES 3.0 specification, section 6.1.15 Internal Format Queries (page 236),
multisampling is not supported for signed and unsigned integer internal formats.
Fixes 19 dEQP tests under 'dEQP-GLES3.functional.state_query.internal_format.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_RGB and GL_RGBA are valid internal formats on a GLES3 profile. See
"Table 1. Unsized Internal Formats" at
https://www.khronos.org/opengles/sdk/docs/man3/html/glTexImage2D.xhtml.
Fixes 2 dEQP tests:
- dEQP-GLES3.functional.state_query.internal_format.rgb_samples
- dEQP-GLES3.functional.state_query.internal_format.rgba_samples
Reviewed-by: Brian Paul <brianp@vmware.com>
In OpenGL and OpenGL-ES 3+, GL_DEPTH_STENCIL_ATTACHMENT is a valid attachment point for the family of functions
that invalidate a framebuffer object (e.g, glInvalidateFramebuffer, glInvalidateSubFramebuffer, etc).
Currently, a GL_INVALID_ENUM error is emitted for this attachment point.
Fixes 21 dEQP test failures under 'dEQP-GLES3.functional.fbo.invalidate.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This increases the cost of a raddr b conflict spill (save r3 to rb31, move
src1 to r3, move rb31 back to r3 when done, instead of just move src1 to
r3), but on average thanks to instruction pairing it's more worthwhile to
have another accumulator.
total instructions in shared programs: 46428 -> 46171 (-0.55%)
instructions in affected programs: 38030 -> 37773 (-0.68%)
The register allocator walks from the end of the nodes array looking for
trivially-allocatable things to put on the stack, meaning (assuming
everything is trivially colorable and gets put on the stack in a single
pass) the low node numbers get allocated first. The things allocated
first happen to get the lower-numbered registers, which is to say the fast
accumulators that can be paired more easily.
When we previously made the nodes match the temporary register numbers,
we'd end up putting the shader inputs (VS or FS) in the accumulators,
which are often long-lived values. By prioritizing the shortest-lived
values for allocation, we can get a lot more instructions that involve
accumulators, and thus fewer conflicts for raddr and WS.
total instructions in shared programs: 52870 -> 46428 (-12.18%)
instructions in affected programs: 52260 -> 45818 (-12.33%)
Since d8da6decea where the
state tracker started using UCMP on cayman a number of tests
regressed.
this seems to be r600g is doing CNDGE_INT for UCMP which is >= 0,
we should be doing CNDE_INT with reverse arguments.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reduces .text size of mesa_dri_drivers.so (i965-only) by 62k, or 1.4%.
Note that we don't remove inline from lerp_2d(), which has a comment
above it saying it definitely should be inlined. Though, removing the
inline keyword from it doesn't actually change the compiled code for me.
Reviewed-by: Brian Paul <brianp@vmware.com>
The docs say that we shouldn't need this workaround for gen8+, but just
removing it, causes gpu hangs. We'll revisit this, but for now, just
extend the workaround to gen9.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
SKL moves the GS threadcount to dw8 from dw7, and no longer does the
divide by 2 thing.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
This patch fixes this build error with G++ <= 4.6.
CXX test_vf_float_conversions.o
test_vf_float_conversions.cpp: In function ‘unsigned int f2u(float)’:
test_vf_float_conversions.cpp:63:20: error: expected primary-expression before ‘.’ token
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86939
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The register allocator prefers low-index registers from vc4_regs[] in the
configuration we're using, which is good because it means we prioritize
allocating the accumulators (which are faster). On the other hand, it was
causing raddr conflicts because everything beyond r0-r2 ended up in
regfile A until you got massive register pressure. By interleaving, we
end up getting more instruction pairing from getting non-conflicting
raddrs and QPU_WSes.
total instructions in shared programs: 55957 -> 52719 (-5.79%)
instructions in affected programs: 46855 -> 43617 (-6.91%)
We can avoid it by carefully ordering the packing. This is important as a
step in giving r3 to the register allocator.
total instructions in shared programs: 56087 -> 55957 (-0.23%)
instructions in affected programs: 18368 -> 18238 (-0.71%)
All uses of this require that the value be at least one, so it's
easier to report at least one than having to wrap all uses
in MAX2(max_compute_units, 1).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Harvested GPUs have some of their render backends disabled, so
in order to prevent the hardware from trying to render things
with these disabled backends we need to correctly program
the PA_SC_RASTER_CONFIG register.
v2:
- Write RASTER_CONFIG for all SEs.
v3:
- Set GRBM_GFX_INDEX.INSTANCE_BROADCAST_WRITES bit.
- Set GRBM_GFX_INFEX.SH_BROADCAST_WRITES bit when done setting
PA_SC_RASTER_CONFIG.
- Get num_se and num_sh_per_se from kernel.
v4:
- Get correct value for num_se
- Remove loop for setting PA_SC_RASTER_CONFIG
- Only compute raster config when a backend has been disabled.
v5: Michel Dänzer
- Fix computation for chips with multiple SEs
https://bugs.freedesktop.org/show_bug.cgi?id=60879
CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
There is a bug in the current lowering pass implementation where we lower saturate
to clamp only for vertex shaders on drivers supporting SM 3.0. The correct behavior
is to actually lower to clamp only when we don't support saturate which happens
on drivers that don't support SM 3.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Fixes an infinite loop in swrast where the lowering pass unpacks saturate into
clamp but the opt_algebraic pass tries to do the opposite.
v3 (Ian):
This is a revert of commit cfa8c1cb "ir_to_mesa: lower ir_unop_saturate" on
the ir_to_mesa.cpp portion. prog_execute.c can handle saturates in vertex
shaders, so classic swrast shouldn't need this lowering pass.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83463
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This was an oversight in the original patch. When PolygonMode is
used, then front faces, back faces, or both may be rendered as
points and are affected by point sprite state.
Note that SNB/IVB can't actually be fully conformant here, for
a legacy context -- we don't have separate sets of pointsprite
enables for front and back faces. Haswell ignores pointsprite
state correctly in hardware for non-point rasterization, so can
do this correctly, but it doesn't seem worth it.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86764
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dead code elimination was eating the Y offset.
Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The original idea was to optimize away the condition by integrating it directly
into the CMP instruction. However, with native integers this requires an extra
I2F instruction. It is also fishy because the negation used didn't really honor
ieee754 float comparison rules, not to mention the CMP instruction itself
(being pretty much a legacy instruction) doesn't really have defined special
float value behavior in any case.
So, use UCMP and adjust the code trying to optimize the condition away
accordingly (I have absolutely no idea if such conditions are actually hit
or would be translated away somewhere else already).
v2: cosmetic changes
No piglit regressions on llvmpipe.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Multiple scenes per context are meant to be used so a new scene can be built
while another one is processed in rasterization. However, quite surprisingly,
this does not actually work (and according to git log, possibly never did,
though maybe it did at some point further back (5 years+) but was buggy)
because we always wait immediately on the rasterizer to finish the scene when
contexts (and hence setup/scene) is flushed. This means when we try to get
an empty scene later, any old one is already empty again.
Thus using multiple scenes is just a waste of memory (not too bad, since the
additional scenes are guaranteed to be empty, which means their size ought to
be one data block (64kB) plus the size of some structs), without actually
really doing anything. (There is also quite some code for the whole concept of
multiple scenes which doesn't really do much in practice, but keep it hoping
the wait-on-scene-flush can be fixed some day.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The prim assembler may change the prim type when injecting prim ids now,
which isn't reflected by what's stored in emit.
This looks brittle and potentially dangerous (it is not obvious if such prim
type changes are really supported by pt emit, the prim type is actually also
set in prepare which would then be different).
This fixes piglit primitive-id-no-gs-first-vertex.shader_test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The decomposition done in the prim assembler will turn tri fans into tris,
but this wasn't reflected in the output prim type. Meaning with a tri fan
with 6 verts input, the output was a tri fan with 12 vertices instead of a
tri list with 12 vertices (not as bad as it sounds, since the additional tris
created would all be degenerate since they'd all have two times vertex zero
but still bogus).
This is because the prim assembler is used if either the input topology is
something with adjacency, or if prim id needs to be injected, and for the
latter case topologies without adjacency can be converted to basic ones.
Unfortunately decomposition here for inserting prim ids is necessary, at
least for the indexed case where we can't just insert the prim id at the
right place depending on provoking vertex.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The default macros when the adjacency macros aren't defined will already
exactly do that (that is, drop the adjacent vertices and call the non-adjacent
macro).
Reviewed-by: Jose Fonseca <jfonseca@vmwarec.com>
Safe from causing optimization loops, since we don't constant propagate
VF arguments.
(for this and the previous patch):
total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
instructions in affected programs: 1616779 -> 1599636 (-1.06%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The LINE instruction performs a multiply-add instruction (a * b + c)
where b and c are scalar arguments. It reads b and c from offsets in
src0 such that you can load them (it they're representable) as a
vector-float immediate with a single instruction.
Hurts some programs, but that'll all get better once we CSE the
vector-float MOVs in the next patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The PRMs say that
<src0> region must be a replicated scalar
(with HorzStride = VertStride = 0).
but apparently that doesn't actually apply to all generations. I did
notice when implementing the optimization later in this series that G45
and ILK needed this regioning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GS has an interesting use for mul. Because the GS can emit multiple
vertices per input vertex, and it also has a unique count at the top of the URB
payload, the GS unit needs to be able to dynamically specify URB write offsets
(relative to the global offset). The documentation in the function has a very
good explanation from Paul on the mechanics.
This fixes around 2000 piglit tests on BSW.
v2:
Reworded commit message (Ben) no mention of CHV (Matt)
Change SHRT_MAX to USHRT_MAX (Ken, and Matt)
Update comment in code to reflect the use of UW (Ben)
Add Gen7+ assertion for the relevant GS code, since it won't work on Gen6- (Ken)
Drop the bogus hunk in emit_control_data_bits() (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84777 (with many dupes)
Cc: "10.4 10.3 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If an operation is the last one to read a register, the instruction
containing it can also include the op that has the next write to that
register.
total instructions in shared programs: 57486 -> 56995 (-0.85%)
instructions in affected programs: 43004 -> 42513 (-1.14%)
We were scheduling TLB operations as early as possible, and texture setup
as late as possible. When I introduced prioritization, I visually
inspected that an independent operation got moved above texture results
collection, which tricked me into thinking it was working (but it was just
because texture setup was being pushed late).
total instructions in shared programs: 57651 -> 57486 (-0.29%)
instructions in affected programs: 18532 -> 18367 (-0.89%)
Jason realized that we could fix the result of the CMP instruction on
Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4
backend before use, rather than when the bool was created. The FS does
this and it saves some unnecessary resolves.
On Ironlake:
total instructions in shared programs: 4289762 -> 4287277 (-0.06%)
instructions in affected programs: 619430 -> 616945 (-0.40%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is a revert of commit 4656c14e ("i965/fs: Change the type of
booleans to UD and emit correct immediates") plus some small additional
fixes, like casting ctx->Const.UniformBooleanTrue to int and changing UD
to D in the ir_unop_b2f cases. Note that it's safe to leave 0x3f800000
as UD and as a literal it's more recognizable than 1065353216.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Three source instructions cannot directly source a packed vec4 (<0,4,1>
regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to
both halves of a register.
If these uniform values are used by multiple three-source instructions,
we'll emit multiple expansion moves, which we cannot combine in CSE
(because CSE emits moves itself).
So emit a virtual instruction that we can CSE.
Sometimes we demote a uniform to to a pull constant after emitting an
expansion move for it. In that case, recognize in opt_algebraic that if
the .file of the new instruction is GRF then it's just a real move that
we can copy propagate and such.
total instructions in shared programs: 5822418 -> 5812335 (-0.17%)
instructions in affected programs: 351841 -> 341758 (-2.87%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts an instruction from two shaders in Tesseract, by allowing the
(x+y) cmp 0 -> x cmp -y optimization to take place.
instructions in affected programs: 1198 -> 1194 (-0.33%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits,
but that is an assumption OpenGL drivers (or any dynamic library for
that matter) can't afford to make as there are many closed- and open-
source application binaries out there that only assume 4-byte stack
alignment.
This fix uses force_align_arg_pointer GCC attribute, and is only a
stop-gap measure.
The right fix would be to pass -mstackrealign or
-mincoming-stack-boundary=2 to all source fails that use any -msse*
option, as there is no way to guarantee if/when GCC will decide to spill
SSE registers to the stack.
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Reviewed-by: Brian Paul <brianp@vmware.com>
BRW_NEW_VERTICES is flagged every time we draw a primitive. Having
the brw_vs_prog atom depend on BRW_NEW_VERTICES meant that we had to
compute the VS program key and do a program cache lookup for every
single primitive. This is painfully expensive.
The workaround bit computation is almost entirely based on the vertex
attribute arrays (brw->vb.inputs[i]), which are set by brw_merge_inputs.
The only thing it uses the VS program for is to see which VS inputs are
actually read. brw_merge_inputs() happens once per primitive, and can
safely look at the currently bound vertex program, as it doesn't change
in the middle of a draw.
This patch moves the workaround bit computation to brw_merge_inputs(),
right after assigning brw->vb.inputs[i], and stores the previous WA bit
values in the context. If they've actually changed from the last draw
(which is uncommon), we signal that we need a new vertex program,
causing brw_vs_prog to compute a new key.
Improves performance in Gl32Batch7 by 13.6123% +/- 0.739652% (n=166)
on Haswell GT3e. I'm told Baytrail shows similar gains.
v2: Introduce a new BRW_NEW_VS_ATTRIB_WORKAROUNDS dirty bit, rather
than reusing BRW_NEW_VERTEX_PROGRAM (suggested by Chris Forbes).
This prevents unnecessary re-emission of surface/sampler related
atoms (and an SOL atom on Sandybridge).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If you hit this, you didn't compile with --with-egl-platforms=...
Recompile with something like --with-egl-platforms=x11,drm and make
clean and make again.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These stopped being necessary in commit ab973403e4.
v2: Update commit message with a better explanation (thanks to Eric
Anholt for doing the git archaeology).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't access brw->vertex_program or ctx->_Shader since the previous
commit, so we don't need this dirty bit.
I think it's still necessary on Gen6 because it still conflates
constant uploading with unit state uploading. We can fix that later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB
programs so that 0^0 == 1. The choice is based entirely on the shader
source language.
Previously, our code to determine which mode we wanted was duplicated
in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8).
The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing
as well - we use CurrentProgram (non-derived state), but _Shader
(derived state). It also relies on knowing that ARB programs don't
use gl_shader_program structures today. The compiler already makes
this assumption in a few places, but I'd rather keep that assumption
out of the state upload code.
With this patch, we select the mode at compile time, and store that
choice in prog_data. The state upload code simply uses that decision.
This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit c0347705 changed the Gen6-7 code to use ctx->_Shader rather than
ctx->Shader, but neglected to change the Gen4-5 or Gen8+ code.
This might fix SSO related bugs, but ALT mode is only used for ARB
programs, so if there's an actual problem, it's likely no one would
run into it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The "Pixel Shader Computed Depth Mode" value is entirely based on the
shader program, so we can easily do it at compile time. This avoids the
if+switch on every 3DSTATE_WM (Gen7)/3DSTATE_PS_EXTRA (Gen8+) upload,
and shares a bit more code.
This also simplifies the PMA stall code, making it match the formula
more closely, and drops a BRW_NEW_FRAGMENT_PROGRAM dependency. (Note
that the previous comment was wrong - the code and the documentation
have != PSCDEPTH_OFF, not ==.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We shouldn't receive variables with invalid locations set - adding these
assertions should help catch problems before they cause crashes later.
Inspired by similar code in st_glsl_to_tgsi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Half gives you the second half of a SIMD16 register, but if the register
is a uniform it would incorrectly give you the next register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nine code to match vertex declaration to vs inputs was limiting
the number of possible combinations.
Some sm3 games have issues with that, because arbitrary (usage/index)
can be used.
This patch does the following changes to fix the problem:
. Change the numbers given to (usage/index) combinations to uint16
. Do not put limits on the indices when it doesn't make sense
. change the conversion rule (usage/index) -> number to fit all combinations
. Instead of having a table usage_map mapping a (usage/index) number to
an input index, usage_map maps input indices to their (usage/index)
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
With sm3, you can declare an input/output with an usage and an usage index.
Nine code hardcodes the translation usage/index to a corresponding TGSI code.
The translation was limited to a few usage/index combinations that were corresponding
to most of the needs of games, but some games did not work.
This patch rewrites that Nine code to map all possible usage/index combination
to TGSI code. The index associated to TGSI_SEMANTIC_GENERIC doesn't need to be low
for good performance, as the old code was supposing, and is not particularly bounded
(it's UINT16). Given the index is BYTE, we can map all combinations.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Nine was allowing that behaviour, but was not filling the result.
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Issuing D3DISSUE_END should:
. reset previous queries if possible
. end the query
Previous behaviour wasn't calling end_query for
queries not needing D3DISSUE_BEGIN, nor resetting
previous queries.
This fixes several applications not launching properly.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some queries need the driver to advertise a cap to be supported.
For example r300 doesn't support them.
v2 (David): check also for PIPE_CAP_QUERY_PIPELINE_STATISTICS, fix wine
tests on r300g
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
get_query_result flushes automatically, we don't need to flush.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Applications are supposed to call CreateQuery with a NULL
ppQuery to know if the query is supported. We supported that.
However when ppQuery was not NULL, we were accepting to create the
query and were creating a dummy query even when the query is not
supported.
Wine has different behaviour. This patch drops the dummy queries
support and matches wine behaviour.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Note that some of the GLSL specifications explicitly state this as
compile error, some simply state that 'it is an error'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The state upload code was incorrectly shifting the attribute swizzles. The
effect of this is we're likely to get the default swizzle values, which disables
the component.
This doesn't technically fix any bugs since Skylake support is still disabled by
default (no PCI IDs).
While here, since VARYING_SLOT_MAX can be greater than the number of attributes
we have available, add a warning to the code to make sure we never do the wrong
thing (and hopefully prevent further static analysis from finding this).
Admittedly I am a bit confused. It seems to me like the moment a user has
greater than 8 varyings we will hit this condition. CC Ken to clarify.
v2: Forgot to git add the warning message in v1
v3: Change the > 31 varyings to an assertion (Ken)
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> (via Coverity)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Vertex color clamping is only relevant if the shader writes to
the built-in gl_[Secondary]{Front,Back}Color varyings. Otherwise,
brw_vs_prog_key::clamp_vertex_color is never used, so we can simply
leave it set to 0.
This enables us to correctly predict the clamp_vertex_color key value
in the precompile for shaders which don't use those varyings.
Eliminates virtually all VS recompiles in Serious Sam 3's intro.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to gl_[Secondary]{Front,Back}Color,
which are compatibility-only built-in varyings. We only support GS in
core profile, so they can't exist in geometry shaders.
We can drop several dirty bits from the GS program key - they're
unnecessary for a core profile implementation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to a few specific built-ins: COL0/1
and BFC0/1 (aka gl_[Secondary]{Front,Back}Color). It seems weird to
handle special cases in a function called emit_generic_urb_slot().
emit_urb_slot() is all about handling special cases, so it makes more
sense to handle this there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With fs_visitor/fs_generator being reused for SIMD8 VS/GS programs,
we're running into weird #include patterns, where scalar code #includes
brw_vec4.h and such.
Program keys aren't really related to SIMD4X2/SIMD8 execution - they
mostly capture NOS for a particular shader stage. Consolidating them
all in one place that's vec4/scalar neutral should help avoid problems.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
It's been merged into brw_state_flags::brw for simplicity and
efficiency.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives. A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.
Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism. The driconf option seems superfluous.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Note from Ken:
"We used to use the return value to indicate whether software
fallbacks were necessary, but we haven't in years."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the code in _mesa_validate_DrawElements,
_mesa_validate_DrawRangeElements, and
_mesa_validate_DrawElementsInstanced was the same. Refactor this out to
common code.
As a side-effect, a bug in _mesa_validate_DrawElementsInstanced was
fixed. Previously this function would not generate an error when
check_valid_to_render failed if numInstances was 0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3-ish versions of the spec are less clear that an error should be
generated here, so Ken (and I during review) just missed it in 1afe335.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
height=0 is legal for 1D array textures (as depth=0 is legal for
2D arrays). Fixes new piglit ext_texture_array-errors test.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together. This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.
total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs: 47361 -> 45429 (-4.08%)
Since 73dd50acf6
glsl: implement switch flow control using a loop
The SB backend was falling over in an assert or crashing.
Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%
total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most prominently helps Natural Selection 2, which has a surprising
number shaders that do very complicated things before drawing black.
instructions in affected programs: 21052 -> 16978 (-19.35%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pre-Haswell hardware couldn't actually predicate it, but it's easier to
pretend as if it's predicated in the visitor since it will generate a
MOV from f0.1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anonymous structures are only supported with newer versions of
GCC. They will not work with GCC 4.2.1 used by OpenBSD or
GCC 4.4.7 shipped with RHEL6 going by a commit to fix a similiar
problem in radeonsi earlier in the year
(74388dd24b).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
The i965 backends pass something out of 'screen', which is allocated
per-process, making using this as a ralloc context not thread-safe.
All callers ra_alloc_interference_graph() already ralloc_free() its
return value.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It was totally broken:
- p_atomic_dec_zero() was returning the negation of the expected value
- p_atomic_inc_return()/p_atomic_dec_return() was
post-incrementing/decrementing, hence returning the old value instead
of the new
- p_atomic_cmpxchg() was returning the new value on success, instead of
the old
It is clear this never used in the past. I wonder if it wouldn't be better to
yank it altogether.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It was much easier for me to verify things build and run as expected
with this simple test, than building and testing whole Mesa.
With scons the test can be build and run merely by doing:
scons u_atomic_test
Building the test with autotools is left as a future exercise.
Reviewed-by: Matt Turner <mattst88@gmail.com>
like how C11's stdatomic.h provides generic functions. GCC's __sync_*
builtins already take a variety of types, so that's simple.
MSVC and Sun Studio don't, but we can implement it with something that
looks a little crazy but is actually quite readable.
Thanks to Jose for some MSVC fixes!
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.
shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
We're supposed to be checking that nothing else writes r4, which is done
by the TMU result collection signal, not the coordinate setup.
Avoids a regression when QPU instruction scheduling is introduced.
Otherwise vertex shader can see stale cache data. This in particular
happens when the same vbo is updated and reused. Not sure yet if vbo's
at differing addresses but bound to same vertex buffer slot could have
issues, but seems safest to flush whenever new vertex buffers are bound.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For drivers building up to GL(ES)3, only expose the actual extension if
the API will let it be used (e.g. via overrides/debug flags that enable
higher versions).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The mesa state tracker doesn't fall back on similar integer formats, so
they must all be provided. Remove the restriction against integer color
rendering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We need to produce a u32 destination type on integer sampling
instructions, so keep that in a shader key set based on the
currently-bound textures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Integer outputs end up getting mangled due to cov.f32f16, and float32
loses precision. Use full precision shaders in both of those cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just pass the data through unmolested. This probably has no effect since
blending isn't actually enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The table contains all the relevant information about each format. The
helper functions now just do lookups in the table.
Note that this adds support for a lot of formats that were previously
unsupported. Additionally it adds disabled support for integer render
buffers, which will require more work to actually enable.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Switch both of them from independently inconsistent conventions to having
UINT/SINT/UNORM/SNORM/FLOAT/FIXED suffixes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
BRW_CACHE_VS_PROG is more easily associated with program caches than
plain BRW_VS_PROG.
While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away
from the outdated Windowizer/Masker name.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This flag signifies that we've emitted a new SAMPLER_STATE table.
Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't
a great name. Putting it in the BRW_NEW_* hierarchy would make more
sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose.
When this flag is raised, the pointer to the SAMPLER_STATE table has
changed, so we need to re-issue any packets which point to it (unit
state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the
per-stage variants on Gen7+).
Saves 2 * sizeof(void *) bytes per context, as we remove useless
aux_compare/aux_free function pointers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong.
The number of samplers used by each program is actually computed at
draw time (brw_try_draw_prims), based purely on the currently bound
shader programs (gl_program::SamplersUsed).
CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table.
Although this could indicate that the number of samplers has changed,
it could also simply mean that the contents of the table has changed
(i.e. we've bound different textures).
The real reason these atoms depend on CACHE_NEW_SAMPLER is because they
include a pointer to the SAMPLER_STATE table. This was not commented.
So, move the comments to the appropriate place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've been streaming these out for ages, so they basically have nothing
to do with brw_state_cache.c.
Saves 6 * sizeof(void *) bytes per context, as we won't have useless
aux_compare/aux_free functions for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These always happen together; the extra atom just means another item to
iterate through, flags to check, and a call through a function pointer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen4-5, unit state is specified as indirect state, rather than
commands. If any unit state changes, we upload it via brw_state_batch
and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which
updates pointers to all unit state at once.
Since there's only one command and state atom (brw_psp_urb_cs) that
needs to know about this, there's no benefit to having six separate
flags. We can combine CACHE_NEW_*_UNIT into a single flag.
We also haven't cached these in a long time, so it doesn't make sense
to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix.
This also saves 12 * sizeof(void *) bytes of memory per context, as
we remove useless aux_compare/aux_free functions for each CACHE bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.
This saves us from having to look an extra key value in a couple of
places, including in the generator.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This means the generator doesn't have to look at the key, which is a
little nicer - we're pretty close to no key dependencies at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kristian noted that there's very little use of brw_wm_prog_key in the
generator, and that it basically just generates what it's told, without
caring about what stage it's handling.
One exception to this is derivative handling. When handling dFdxCoarse
and dFdxFine, we packed an enum value in a second source register,
explicitly telling the generator what to do. For dFdx, we specified an
enum value of "please use the hint", then checked the program key in the
generator level code.
A natural method is to define separate FS_OPCODE_DD[XY]_{COARSE,FINE}
opcodes, and have the front-end (which already decides what IR to
generate based on the program key) decide which dPdx/dPdy should
correspond to. This consolidates the decision making in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
prog_data->foo is a bit more readable than brw->wm.prog_data->foo.
The local variable definition is also a great location to put the
obligatory /* CACHE_NEW_WM_PROG */ comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
brw->wm.prog_data is covered by CACHE_NEW_WM_PROG, not
BRW_NEW_FRAGMENT_PROGRAM. So, we should listen to it.
However, I believe that BRW_NEW_FRAGMENT_PROGRAM is sufficient to cover
all the necessary cases - CACHE_NEW_WM_PROG happens in a subset of
cases. So, the code being wrong shouldn't have triggered bugs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Right now, in mesamatrix.net, the footnote is set so that it seems to be
for all the features, while actually it only applies to MSAA.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Flex and lex have a special action ‘|’ which means to use the same action as
the next rule. We can use this to reduce a bit of code duplication in the
rules for the various float literal formats.
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the GLSL spec float literals like ‘1f’ shouldn't be allowed
without adding a decimal point or an exponent. Apparently the AMD driver also
disallows this so it seems unlikely that anything would be relying on it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are using 1 more buffer than we have, although in the future the
driver should just end up using one buffer in total probably, this
is a good first step, it merges the txq cube array and buffer info
constants on r600 and evergreen.
This should in theory fix geom shader tests on r600.
v1.1: fix comments from Glenn.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just use the same entrypoints we use for st/wgl's opengl32.dll.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's not exported by the official opengl32.dll neither. Applications are
supposed to get it via wglGetProcAddress(), not GetProcAddress().
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes several MSVC warnings like:
warning C4273: 'glClearColorx' : inconsistent dll linkage
In fact, we should avoid using `declspec(dllexport)` altogether, and use
exclusively the .DEF instead, which gives more precise control of which
symbols must be exported, but all the public GL/GLES headers practically
force us to pick between `declspec(dllexport)` or
`declspec(dllimport)`.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Addresses MSVC warnings "result of 32-bit shift implicitly converted to
64 bits (was 64-bit shift intended?)", which can often be symptom of
bugs, but in these cases were all benign.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
- Remove no-op if-clause.
- -mstackrealign has been enabled again on MinGW for quite some time and
appears to work alright nowadays.
- Drop -mmmx option as it is implied my -msse, and we don't use MMX
intrinsics anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
PIPE_CAP_VIDEO_MEMORY returns the amount of video memory in megabytes,
so need to converted it to bytes.
Fixed Warframe memory detection.
v2: also prepare for cards with more than 4GB memory
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
D3DPOOL_SCRATCH is disallowed according to spec.
D3DPOOL_SYSTEMMEM should be allowed but we don't handle it right for now.
v2: Fixes segfault in SetTexture when unsetting the texture
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Fixes "Error : CONST[20]: Undeclared source register" when running
dx9_alpha_blending_material. Also artifacts on ilo.
v2: also remove unused MISC_CONST
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch moves the data field from Resource9 to Surface9 and cleans
D3DPOOL_SYSTEMMEM handling in Texture9. This fixes HL2 lost coast.
It also removes in Texture9 some code written to support importing
and exporting non D3DPOOL_SYSTEMMEM shared buffers. This code hadn't
the design required to support the feature and wasn't used.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Rather than shoving all the VL code for non-VL targets, increasing
their size, just split it out and use it when needed. This gives us
the side effect of building vl_winsys_dri.c once, dropping a few
automake warnings, and reducing the size of the dri modules as below
text data bss dec hex filename
5850573 187549 1977928 8016050 7a50b2 before/nouveau_dri.so
5508486 187100 391240 6086826 5ce0aa after/nouveau_dri.so
The above data is for a nouveau + swrast + kms_swrast 'megadriver'.
v2: Do not include the vl sources in the auxiliary library.
v3: Rebase. Add nine.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With follow up commit we'll split vl static lib from the auxiliary one,
and choose the appropriate vl (galliumvl or galliumvl_stub) for the
respective targets to link against.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will be used by the non-VL targets, to stub out the functions called
by the drivers. The entry point to those are within the VL
state-trackers, yet the compiler cannot determine that at link time.
Thus we'll need to stub them out to prevent unresolved symbols in the
dri, egl, gbm and pipe-loader targets.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Set a single VL_{CFLAG,LIBS} for xcb and friends, and let each target
check for it's relevant library alone. Required as with follow up
commits we'll build aux/vl into a separate module, which needs VL_CFLAGS
Cleanup add a couple of explicit LIBDRM_LIBS linking, as aux/vl itself
requires libdrm, despite that LIBDRM_{RADEON,NOUVEAU...} may provide it
as well.
v2: Rebase. Make sure st/xvmc programs work.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Or we might end up where automatically enable the build, only to error
out a couple of lines after that.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.
V2:
- Removed the SSSE3 related part for the not yet merged patch.
- Avoiding redefinition of macros.
Tested-by: David Heidelberg <david@ixit.cz>
Since pack_bytes expands to two mov(4) align1 instructions, we can't use
swizzles directly. For an instruction like
pack_bytes m4.y:UD, vgrf13.xyzw:UD
we can write into the .y component by settings the offset based on the
swizzle.
Also while we're doing this, we can set the dependency control hints
properly, so that a series of pack_bytes writing into separate
components of a register can issue without blocking.
Fixes broken rendering in Windows-based QtQuick2 apps run through Wine.
This library sets all texture units' GL_COORD_REPLACE, leaves point
sprite mode enabled, and then draws a triangle fan.
Will need a slightly different fix for Gen4-5, but I don't have my old
machines in a usable state currently.
V2: - Simplify patch -- the real changes are no longer duplicated across
the Gen6 and Gen7 atoms.
- Also don't clobber attr overrides -- which matters on Haswell too,
and fixes the other half of the problem
- Fix newly-introduced warnings
V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than
core flag and state; keep the state flags in order.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia noticed that my lowering pass was converting the constant array
used by textureGatherOffsets' offsets parameter to a uniform. This
broke textureGather for Nouveau, and is generally a horrible plan,
since it violates the GLSL constraint that offsets must be an
immediate constant.
When I wrote this pass, I neglected to consider whole array assignment.
I figured opt_array_splitting would handle constant indexing, so this
pass was really about fixing variable indexing.
textureGatherOffsets is an example of whole array access that we really
don't want to touch. Whole array copies don't appear to benefit from
this either - they're most likely initializers for temporary arrays
which are going to be mutated anyway. Since you're copying, you may
as well copy from immediates, not uniforms.
This patch makes the pass look for ir_dereference_arrays of
ir_constants, rather than looking for any ir_constant directly.
This way, it ignores whole array assignment.
No shader-db changes or Piglit regressions on Haswell. Some Piglit
tests generate different code (fixing textureGatherOffsets on Nouveau).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We already precompile GLSL programs; it seems logical to precompile ARB
programs as well. We just never hooked it up.
This also makes the programs compile even if no drawing occurs, which is
useful for shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also,
brw_fs_precompile had C++ linkage, while the others were C.
This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.
This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_shader_precompile should just do a precompile; it makes more sense
for the caller to decide whether we should do one. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
llvmpipe disables denorms on purpose (on x86/sse only), because denorms are
generally neither required nor desired for graphic apis (and in case of d3d10,
they are forbidden).
However, this caused some arithmetic tests using denorms to fail on some
systems, because the reference did not generate the same results anymore.
(It did not fail on all systems - behavior of these math functions is sort
of undefined when called with non-standard floating point mode, hence the
result differing depending on implementation and in particular the sse
capabilities.)
So, for the reference, simply flush all (input/output) denorms manually
to zero in this case.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=67672.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The extension itself was deleted 2 years ago. There are still some
prog_instruction opcodes from NV_fp that exist because they're used by
ir_to_mesa.cpp, though.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
They're part of NV_vertex_program2, which I'm pretty sure we're never
going to support.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
The nice thing about the good way of initializing arrays like this is that
you don't need to initialize everything in order, or even everything at
all. Taking advantage of that only needs a tiny fixup to deal with the
default NULL value of the pointers.
I haven't dropped the initialization of opcodes that exist and are unsupported.
This was the only state tracker emitting it, and hardware was just having
to lower it anyway (or failing to lower it at all).
v2: Extracted from a larger patch by Jose (which also dropped DP2A), fixed
to actually not reference TGSI_OPCODE_CND. Change by anholt.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
The translation is lowering it to not using TGSI_OPCODE_NRM, anyway.
v2: Extracted from a larger patch by Jose that also dropped DP2A usage.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Caught by clang.
warning: comparison of constant -1 with expression of type
'ir_texture_opcode' is always false
[-Wtautological-constant-out-of-range-compare]
if (op == -1)
~~ ^ ~~
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts a little more than 1k of .text size from i915g.
This was previously done in commit 5f66b340 and subsequently reverted in
commit 3661f757 after bug 30514 was filed. I believe the cause of bug
30514 wasn't anything related to cross compiling, but rather that the
toolchain used defaulted to -march=i386, and i386 doesn't have the
CMPXCHG or XADD instructions used to implement the intrinsics.
So we reverted a patch that improved things so that we didn't break
compilation for a platform that never could have worked anyway.
Ben was asking about the undocumented restriction that the math
instruction cannot use the dependency control hints. I went to reconfirm
and disabled the is_math() check in opt_set_dependency_control() and saw
that the disassembled math instructions with dependency hints had a
bogus math function. We were mistakenly overwriting it by setting an
empty conditional mod.
Unfortunately, this wasn't the cause of the aforementioned problem (I
reproduced it). This bug is benign, since we don't set dependeny hints
on math instructions -- but maybe some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These were added in commits a760c738 and 43757135 to be used in
implementing C-style aggregate initializers (commit 1b0d6aef). Paul
rewrote that code in commit 0da1a2cc to use GLSL types, rather than
AST types, leaving these copy constructors unused.
Tested by making them private and providing no definition.
Uniform names (even for hidden uniforms) are required to be unique; some
parts of the compiler assume they can be looked up by name.
Fixes the piglit test: tests/spec/glsl-1.20/linker/array-initializers-1
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting a uniform array reference to a pull constant load, the
`reladdr` expression itself may have its own `reladdr`, arbitrarily
deeply. This arises from expressions like:
a[b[x]] where a, b are uniform arrays (or lowered const arrays),
and x is not a constant.
Just iterate the lowering to pull constants until we stop seeing these
nested. For most shaders, there will be only one pass through this loop.
Fixes the piglit test:
tests/spec/glsl-1.20/linker/double-indirect-1.shader_test
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves all the CUBE section above the gradients section,
so that the gradient emission happens on one block which
is what sb/hardware expect.
v2: avoid changes to bytecode by using spare temps
v2.1: shame gcc, oh the shame. (uninit var warnings)
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The piglit tests were failing, and it appeared to be SB
optimising out things, but Glenn pointed out the gradients
are meant to be clause local, so we should emit the texture
instructions in the same clause. This moves things around
to always copy to a temp and then emit the texture clauses
for H/V.
v2: Glenn pointed out we could get another ALU fetch in
the wrong place, so load the src gpr earlier as well.
Fixes at least:
./bin/tex-miplevel-selection textureGrad 2D
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
res->bind is not an indicator of how the resource is currently bound.
buffers can be rebound across different binding points without changing
underlying storage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
This creates a usable layout for all NPOT textures. Of course these
still have lots of limitations, but at least we can render to a
level.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
For NPOT texture layouts, we want to be able to access texture levels
other than 0 directly. Since the hw doesn't support that, We do it by
adding the offset directly.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This happens with glsl-convolution-1, where we have 64 constants. This
doesn't make the test pass (we don't have 64 constants anyway, only
32) but this prevents it from crashing.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This patch remove workaround related to LLVM < 3.2 bug.
Original bug has been closed as fixed in 2011.
At this moment gallium requires LLVM 3.3 (2013).
LLVM has been tested without SSE2 support in commit
ca70de9bd2 and removed after requiring
LLVM 3.3 in commit 013ff2fae1
Original LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=6960
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In commit 5e37a2a4a8, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.
NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.
v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.
I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.
Fixes rendering of the "electric" background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.
demos/cubemap still works, which hopefully indicates that this doesn't
break things.
Also tested with:
bin/glean -o -v -v -v -t +texCube --quick
bin/cubemap -auto
from piglit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Add support for decoding the new branch control bit. I saw two things wrong with
the existing code.
1. It didn't bother trying to decode the bit.
- While we do not *intentionally* emit this bit today, I think it's interesting
to see if we somehow ended up with the bit set. It may also be useful in the
future.
2. It seemed to be the wrong bit.
- The docs are pretty poor wrt which bit this actually occupies. To me, it
/looks/ like it should be bit 28. I am not sure where Ken got 30 from. I
verified it should be 28 by looking at the simulator code.
I also added the most basic support for GOTO simply so we don't need to remember
to change the function in the future.
v2:
Move the branch_ctrl check out of the if gen >= 6 check to make it more
readable. (Matt)
ENDIF doesn't have branch_ctrl (Matt + Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts f4dd099171.
The src/gallium/tests/unit/translate_test.c gives the same results on
MinGW 64-bits as on Linux 64-bits. And since MinGW is often used for
development/testing due to its convenience, it's better not to have this
sort of differences relative to MSVC.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No changes required in the driver itself, all handled by draw.
piglit results in a quick run:
skip->pass 7
skip->fail 2
(The new failures in the ARB_fragment_layer_viewport group are expected,
we fail the same if gs doesn't write these outputs regardless of the vs.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Mostly add a couple cases so we don't just check gs for this.
There's only one gotcha, the built-in vp transform in the llvm vs can't
handle it (this would be fixable though non-trivial due to vp index being
non-constant for the SoA outputs, but we don't use it if there's a gs
neither - the whole clip/vp transform integration there is suboptimal).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When calling vaCreateImage() an internal copy of VAImage is maintained
since the allocation of "image" may not be guaranteed to live long enough.
Signed-off-by: Michael Varga <Michael.Varga@amd.com>
For 1D and 2D arrays we don't want the other coordinates being
offset and affecting where we sample. I wrote this patch 6 months
ago but lost it.
Fixes:
./bin/tex-miplevel-selection textureLodOffset 1DArray
./bin/tex-miplevel-selection textureLodOffset 2DArray
./bin/tex-miplevel-selection textureOffset 1DArray
./bin/tex-miplevel-selection textureOffset 1DArrayShadow
./bin/tex-miplevel-selection textureOffset 2DArray
./bin/tex-miplevel-selection textureOffset(bias) 1DArray
./bin/tex-miplevel-selection textureOffset(bias) 2DArray
v2: rewrite to handle more cases and be consistent with code
above.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, the kernel would dispatch thread 0, wait, then dispatch thread
1. By insisting that the thread contents use semaphores in the right
place, the kernel can sleep for longer by dispatching both threads at
once.
When using the stand alone compiler, if we try to link a shader with vertex
attributes it will segfault on linking as the binding hash tables are not
included in the shader program. Obviously, we cannot make the linking stage
succeed without the bound attributes but we can prevent the crash and just
let the linker spit its own error.
Reviewed-by: Brian Paul <brianp@vmware.com>
We cannot guarantee that vertex buffers have the necessary alignment for
fetching all AoS members at once (for instance 4x32bit XYZW data). We can
however guarantee that for textures. This did not cause errors for older
llvm versions but it now matters and will cause segfaults if the data
happens to not be aligned. Thus we need to set alignment manually.
(Note that we can't actually really guarantee data to be even element aligned
due to offsets in vertex buffers being bytes and OpenGL allowing this, but
it does not matter for x86 as alignment is only required for sse vectors -
not sure what happens on other archs, however.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=85467.
Not all drivers can set gl_Layer from VS. Add a fallback that passes the
instance id from VS to GS, and then uses the GS to set the layer.
Tested by adding
quad_buffers |= clear_buffers;
clear_buffers = 0;
to the st_Clear logic, and forcing set_vertex_shader_layered in all
cases. No piglit regressions (on piglits with 'clear' in the name).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
DRI_PRIME setups have different issues due the lack of dma-buf fences
support in the drivers. For DRI3 DRI_PRIME, a race can appear, making
tearings visible, or worse showing older content than expected. Until
dma-buf fences are well supported (and by all drivers), an alternative
is to send the buffers to the server only when rendering has finished.
Since waiting the rendering has finished in the main thread has a
performance impact, this patch uses an additional thread to offload the
wait and the sending of the buffers to the server.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Implements vblank_mode and throttling, which allows us change default ratio
between framerate and input lag.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Work of Joakim Sindholt (zhasha) and Christoph Bumiller (chrisbmr).
DRI3 port done by Axel Davy (mannerov).
v2: - nine_debug.c: klass extended from 32 chars to 96 (for sure) by glennk
- Nine improvements by Axel Davy (which also fixed some wine tests)
- by Emil Velikov:
- convert to static/shared drivers
- Sort and cleanup the includes
- Use AM_CPPFLAGS for the defines
- Add the linker garbage collector
- Restrict the exported symbols (think llvm)
v3: - small nine fixes
- build system improvements by Emil Velikov
v4: [Emil Velikov]
- Do no link against libudev. No longer needed.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
v3: thanks to Brian, improved coding style, also glennk helped spot few
things (unsigned -> int, two constify)
v4: thanks Ilia improved function, dropped u_box_clip_3d
v5: incorporated rest of Gregor proposed changes,clean ups
v6: u_box_clip_2d simplify proposed by Ilia Mirkin
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
At this moment we use only zero or positive values.
v2: Implement it for also for Solaris, MSVC assembly
and enable for other combinations.
v3: Replace MSVC assembly by assert + warning during compilation
v4: remove inc and dec with return for MSVC assembly
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
software renderer backend when using the pipe loader.
v2: - remove unneeded ifdef
- use GALLIUM_PIPE_LOADER_WINSYS_LIBS
- check for CALLOC_STRUCT
thanks to Emil Velikov
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Some of the geom shader tests produce an empty vertex shader,
on cayman we'd crash in the finaliser because last_cf was NULL.
cayman doesn't need the NOP workaround, so if the code arrives
here with no last_cf, just emit an END.
fixes crashes in a bunch of piglit geom shader tests.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The sampler_array_size field was added by "mesa/st: add support for
dynamic sampler offsets". But the field wasn't getting copied in
the get_pixel_transfer_visitor() or get_bitmap_visitor() functions.
The count_resources() function then didn't properly compute the
glsl_to_tgsi_visitor::samplers_used bitmask. Then, we didn't declare
all the sampler registers in st_translate_program(). Finally, we
asserted when we tried to emit a tgsi ureg src register with File =
TGSI_FILE_UNDEFINED.
Add the missing assignments and some new assertions to catch the
invalid register sooner.
Cc: "10.3, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Using the asynchronous DMA engine for multi-dimensional operations seems
to cause random GPU lockups for various people. While the root cause for
this might need to be fixed in the kernel, let's disable it for now.
Before re-enabling this, please make sure you can hit all newly enabled
paths in your testing, preferably with both piglit and real world apps,
and get in touch with people on the bug reports below for stability
testing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Unfortunately no LLVM type was generated for pipe_viewport_state -- it
was being treated as a single floating point array --, so llvmpipe (and
any driver that relies on draw/llvm) got totally busted.
We were missing a few files
- The version scripts
- Android & scons build scripts
- A few headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add all headers into Makefile.sources
- Don't forget the target-helpers
- Add the python scripts & the formats table/list (csv)
- Temporary add vl/vl_winsys_dri.c to EXTRA_DIST until we rework the
way VL is build.
- Add the following to EXTRA_DIST - they are included via the
generated u_indices_gen.c thus we should not add them to *SOURCES.
indices/u_indices.c
indices/u_unfilled_indices.c
XXX: Should we nuke gallivm/f.cpp ? It seems that no-one is using it.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The DRM_IOCTL_MODE_CREATE_DUMB (and others) IOCTL isn't very rigorously
specified, which has the effect that some kernel drivers do not consider
the .pitch and .size fields of struct drm_mode_create_dumb outputs only.
Instead they will use these as lower bounds and overwrite them only if
the values that they compute are larger than what userspace provided.
This works if and only if userspace initializes the fields explicitly to
either 0 or some meaningful value. However, if userspace just leaves the
values uninitialized and the struct drm_mode_create_dumb is allocated on
the stack for example, the driver may try to overallocate buffers.
Fortunately most userspace does zero out the structure before passing it
to the IOCTL, but there are rare exceptions. Mesa is one of them. In an
attempt to rectify this situation, kernel drivers are being updated to
not use the .pitch and .size fields as inputs. However in order to fix
the issue with older kernels, make sure that Mesa always zeros out the
structure as well.
Future IOCTLs should be more rigorously defined so that structures can
be validated and IOCTLs rejected if output fields aren't set to zero.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Addition of color fmt bitfield to this register (compared to a3xx) means
we need to re-emit if either prog or framebuffer state is dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This reverts commit 8d3f739383.
In the last commit we've updated our check to determine if the actual
code is buildable, rather than if the compiler acknowledges the option.
I.e. did anyone provide -mno-sse4.1 vs is my compiler too old.
Now this code will never be attemped to be build, in both cases.
Confirmed by building mesa with
export CFLAGS='-march=native -mno-sse4.1'
./configure && make
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So when checking/building sse code we have three possibilities:
1 Old compiler, throws an error when using -msse*
2 New compiler, user disables sse* (-mno-sse*)
3 New compiler, user doesn't disable sse
The original code, added code for #1 but not #2. Later on we patched
around the lack of handling #2 by wrapping the code in __SSE4_1__.
Yet it lead to a missing/undefined symbol in case of #1 or #2, which
might cause an issue for #2 when using the i965 driver.
A bit later we "fixed" the undefined symbol by using #1, rather than
updating it to handle #2. With this commit we set things straight :)
To top it all up, conventions state that in case of conflicting
(-enable-foo -disable-foo) options, the latter one takes precedence.
Thus we need to make sure to prepend -msse4.1 to CFLAGS in our test.
v2: Clean the #includes. Suggested by Ilia, Matt & Siavash.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Siavash Eliasi <siavashserver@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will be reused for the scalar VS pass.
v2 (Ken): Rebase on master.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll reuse this toplevel optimization driver for the scalar VS.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These last few operations all only apply when we've actually generated
code, optimized and allocated registers. The dummy and the repclear
shaders don't need the gen4 send workaround, and don't spill. This
means we can move these lines into the else-branch, which will make
the following refactoring easier.
v2 (Ken): Rebase on master, which removed the uncompressed stack.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We split out SIMD8 and SIMD16 generation into seperate calls to
new method generate_code(), which returns the start offset for the
generated code. A new get_assembly() method returns the generated code.
This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data
in the generator.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Derived from st/glx's GLX_EXT_create_context_es/es2_profile implementation.
Tested with an OpenGL ES 2.0 ApiTrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
The latest version of the specs explicitly allow it, and given that Mesa
universally supports KHR_debug we should definitely support it.
Totally untested. (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
17 insertions(+), 102 deletions(-). Works just as well.
v2: Make emit_math take const references (suggested by Matt),
drop redundant WRITEMASK_XYZW setting (Matt and Curro).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Every other unit in the geometry pipeline automatically enables
statistics gathering. This part of the pipe has been controlled by the
DEBUG_STATS variable, but this is asymmetric. This dates back to the
original implementation, and I am not sure if there is a reason for it.
I need access to these stats to implement ARB_pipeline_statistics_query.
Eric wrote it, and Ken touched it last. Do you have any opposition?
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86145
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
According to gen2 BSpec the pipeline must be flushed at least up to the
windower before changing the scissor rect enable field. Emitting the
3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE is sufficient
to do that.
gen3 BSpec no longer has that piece of text, but let's make the same
change there too for symmetry. The spec does still say that the scissor
rectangle must be defined before enabling it, so the new order does seem
more in line with the spec.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't have fragment shaders so we shouldn't be calling
_mesa_meta_glsl_Clear() on gen2. Restore the appropriate
ARB_fragment_shader check to the clear path which was lost in:
commit 94f22fbe78
Author: Tapani Pälli <tapani.palli@intel.com>
Date: Wed Aug 8 20:46:45 2012 +0300
intel: use _mesa_meta_Clear with OpenGL ES 1.1 v2
v2: Fix spelling in commit message
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
TEXTURE_SET() is the only register macro that forgets to wrap the
argument evaluation in parens. Only simple integers are passed to this
macro so there's no bug but sitll it seems prudent to add the
parens.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't support depth/stencil textures, and since
commit c1d4d49993
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Thu Apr 24 14:11:43 2014 +0300
i915: Don't advertise Z formats in TextureFormatSupported on gen2
depth/stencil formats are no longer accepted as texture formats.
However we still want depth/stencil renderbuffers, so add explicit
format checks to intel_alloc_renderbuffer_storage() to allow such
things.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
gen2 doesn't supporte linear mip filter with anisotropic min/mag
filtering. The hardware would automagically downgrade the min/mag
filters to linear in such cases, which IMO looks worse than forcing
the mip filter to nearest.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The spec says using DOT4 for alpha is undefined unless DOT4 is also used
for color. It seems to do the right thing anyway, but better safe than sorry.
Also override numAlphaArgs to 2 for DOT4 since that's what it wants.
This migth fix something in case the specified alpha mode has only one
argument. Also avoids emitting a needless 3DSTATE_MAP_BLEND_ARG if
the specified alpha mode has three arguments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
All the shaders we've received so far had this be the case, but with
nir-to-tgsi that changed.
I might decide to make nir-to-tgsi keep the outputs in the same order, for
debugging sanity, but I'm not sure.
apitrace now supports it, and it makes it much easier to test
tracing/replaying on OpenGL ES contexts since
GLX_EXT_create_context_{es2,es}_profile are widely available.
Reviewed-by: Brian Paul <brianp@vmware.com>
I used these in the SEL peephole, but they require extra tracking and
fix ups. The SEL peephole can pretty easily find the blocks it needs
without these.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Make the helpers fill out valid Gen7 3DSTATE_SF and 3STATE_SBE. This
prevents the helpers from having to do
dw[0] = GEN7_SBE_DW1_x; // setting DW1 value to dw[0]!?
and simplifies gen7_3DSTATE_{SF,SBE}().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Render stream and render enable are independent from so enable. Having a
single return point makes it easier to see that.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Started to make pipe_stream_output_info mandatory, but ended up adding support
for stream id and making a workaround Gen7-specific.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces gen6_fill_3dstate_constant(). gen6_3DSTATE_CONSTANT_{VS,GS,PS}
are made wrappers of the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add gen6_so_3DSTATE_GS(), gen6_disable_3DSTATE_GS(), and
gen7_disable_3DSTATE_GS() to do SO on GEN6 or to disable GS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
On x86-64 this saves 8 bytes of padding in the structure, and this
reduces the size of the structure to 32 bytes.
v2: Fix constructor so that GCC won't warn about the order of
initialization.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Due to the total number of bits used in the bitfield, this does not
increase the size of the structure.
It does, however, reduce the number of instructions required each time
one of these fields is accessed. To access ::matrix_columns with the
bitfield, three instructions were required:
movzbl 0x9(%rdx),%eax
shr %al
and $0x7,%eax
As a uint8_t, only one instruction is required.
movzbl 0xa(%rdx),%eax
These fields are accessed *a lot*.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 48,103,497 16,556,096 676,447
After (64-bit): 45,722,616 15,737,964 670,607
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 61,472,611 21,051,222 821,361
After (32-bit): 57,987,421 19,872,226 811,609
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There are no uniforms in OpenGL ES 1.x, so we can't even get to this
code in that API.
Also, reorder the checks. First check that transpose is true, then
check whether or not that is legal in the current API. transpose should
never be true in an ES2 context, so this gets one check (the more
expensive one) out of the main path.
Valgrind callgrind results for a trace of Tesseract:
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (64-bit): 96,119,025 24,240,510
After (64-bit): 90,726,569 22,926,662
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (32-bit): 132,434,452 29,051,808
After (32-bit): 126,658,112 27,989,316
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before ARB_explicit_uniform_location, Mesa's location encoding allowed
locations for non-array types that had non-zero array indices.
Basically, part of the location was the uniform and part was the array
index. This meant that some checks had to occur for arrays and
non-arrays. This is no longer possible, we the checks can be split up.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 50,499,557 17,487,316 686,227
After (64-bit): 50,023,791 17,274,432 684,293
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 62,968,039 21,732,380 828,147
After (32-bit): 62,373,967 21,490,756 826,223
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I really wanted to remove 'shProg != NULL' as well, but that would have
required adding a dummy program as the default program. That seemed
like more churn than removing one test was worth.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Only one caller wanted to generate an error when location == -1, so move
the error generation to that caller. There will be more callers in the
future that do not want to generate errors.
Move the location == -1 check later in validate_uniform_parameters. As
currently implemented, glUniform1iv(-1, -1, data) would not generate an
error, but it should due to count being < 0.
The location that I have moved it to will make more sense with the next
commit.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 51,241,217 17,740,162 689,181
After (64-bit): 50,499,557 17,487,316 686,227
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 63,940,605 21,987,918 831,065
After (32-bit): 62,968,039 21,732,380 828,147
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The GL_ enums were previously used because glsl_types.h couldn't be used
in C code. That was fixed some time ago (and uniforms.c already
includes glsl_types.h), so this is no longer necessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Derive whether a RT supports blending, logicop, and the like when
set_framebuffer_state() is called. This enables us to simplify
gen6_BLEND_STATE().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to the documentation, line widths higher than 40.0 may have
quality problems. That's already 20 times larger than we've been
exposing, so it seems totally sufficient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We've artificially been limiting this to 5 for no particular reason.
On Gen4-5, the limit is [0, 7.5] with a granularity of 0.5 (U3.1).
On Gen6+, the limit is [0, 7.9921875]. Since it's a U3.7, the
granularity should be 0.125 (1/8).
This patch conservatively advertises one granularity smaller than the
hardware's maximum value, just in case there's a problem using the
largest possible value. On Gen4-5, this is 7.5 - 0.5 = 7.0. On Gen6+,
this is 8.0 - 0.125 = 7.875.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than hardcoding platform values in every code path, just use the
maximum value we set.
Currently, ctx->Const.LineWidth == 5, which is smaller than the hardware
limit. But applications shouldn't be using a value larger than we
support anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Line Width moved to DW1 bits 29:12. It's actually now a U11.7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use clamping constants that guarantee no integer overflows.
As spotted by Chris Forbes.
This causes the code to change as:
- value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
+ value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);
- value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f));
+ value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f));
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On Windows, DllMain calls and thread creation/destruction are
serialized, so when llvmpipe is destroyed from DllMain waiting for the
rasterizer threads to finish will deadlock.
So, instead of waiting for rasterizer threads to have finished, simply wait for the
rasterizer threads to notify they are just about to finish.
Verified with this very simple program:
#include <windows.h>
int main() {
HMODULE hModule = LoadLibraryA("opengl32.dll");
FreeLibrary(hModule);
}
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
No longer needed as the file was removed with
commit 8c229d306b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Mon Aug 11 10:07:07 2014 -0700
i965: Delete the Gen8 code generators.
We now use the brw_eu_emit.c code instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
The function is not called by platform_drm. As such one needs to
pay special attention at teardown.
v2: Fix the comment block. Spotted by Ken.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the "display" - gbm.
As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.
v2:
- Drop the ifdef HAVE_DRM_PLATFORM, reindent the code and fix the
comment block. Suggested by Ken.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Move opcode to string mappings to functions of their own. Have for consistent
outputs for similar opcodes.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
When the earlier block ended with control flow, we'd mistakenly remove
some of its links to its children. The same happened with the later
block.
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
A pattern in certain shaders is:
uniform vec4 colors[NUM_LIGHTS];
for (int i = 0; i < NUM_LIGHTS; i++) {
...use colors[i]...
}
In this case, the application author expects the shader compiler to
unroll the loop. By doing so, it replaces variable indexing of the
array with constant indexing, which is more efficient.
This patch extends the heuristic to see if arrays accessed within the
loop are indexed by an induction variable, and if the array size exactly
matches the number of loop iterations. If so, the application author
probably intended us to unroll it. If not, we rely on the existing
loop-too-large heuristic.
Improves performance in a phong shading microbenchmark by 2.88x, and a
shadow mapping microbenchmark by 1.63x. Without variable indexing, we
can upload the small uniform arrays as push constants instead of pull
constants, avoiding shader memory access. Affects several games, but
doesn't appear to impact their performance.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Consider GLSL code such as:
const ivec2 offsets[] =
ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
ivec2(0, -1), ivec2(0, 0), ivec2(0, 1),
ivec2(1, -1), ivec2(1, 0), ivec2(1, 1));
ivec2 offset = offsets[<non-constant expression>];
Both i965 and nv50 currently handle this very poorly. On i965, this
becomes a pile of MOVs to load the immediate constants into registers,
a pile of scratch writes to move the whole array to memory, and one
scratch read to actually access the value - effectively the same as if
it were a non-constant array.
We'd much rather upload large blocks of constant data as uniform data,
so drivers can simply upload the data via constbufs, and not have to
populate it via shader instructions.
This is currently non-optional because both i965 and nouveau benefit
from it, and according to Marek radeonsi would benefit today as well.
(According to Tom, radeonsi may want to handle this itself in the long
term, but we can always add a flag when it becomes useful.)
Improves performance in a terrain rendering microbenchmark by about 2x,
and cuts the number of instructions in about half. Helps a lot of
"Natural Selection 2" shaders, as well as one "HOARD" shader.
total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
instructions in affected programs: 5880 -> 4186 (-28.81%)
v2: Use ir_var_hidden to avoid exposing the new uniform via the GL
uniform introspection API.
v3: Alphabetize Makefile.sources properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Walk through the list and free each config, and finally free the list
itself. Freeing approx 20KiB of memory, according to valgrind.
Inspired by a similar patch by enpeng xu.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
driUnbindContext() checks for valid drawables before calling the driver
unbind function. In case of Surfaceless contexts, the drawables are always
Null and we end up not releasing the underlying DRI context. Moving the
call to the driver function before the drawable validity checks fixes things.
Steps to trigger this bug are following:
- create surfaceless context and make it current
- make some other context current
- {another thread} destroy surfaceless context
- make another context current
Signed-off-by: Alexandros Frantzis <Alexandros.Frantzis@canonical.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74563
The dummy shader sends an EOT message to end itself. There are many more
works need to be done on the compiler side before we can advertise
PIPE_CAP_COMPUTE.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Use util_dynarray in ilo_set_global_binding() to allow for unlimited number of
global bindings. Add a comment for global bindings.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We need to know the local/input/private sizes and others. This is not
complete. We need many others for CURBE setup.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
They will be used to report compute params or program compute states.
thread_count can also be used for 3DSTATE_VS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This started hitting an assertion recently. Only affects Haswell
(Ivybridge doesn't support this meddling with the sampler state pointer,
and ARB_gpu_shader5 is not enabled yet on Broadwell)
14 Piglits crash->pass.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
(n=80) at 1280x720 on Broadwell GT3. Together with the previous patch,
it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.
Note that without the PMA stall fix, this would instead decrease
performance by 22%.
v2: Update comment (noticed by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain non-promoted depth cases typically incur stalls. In very
specific cases, we can enable a workaround which improves performance.
Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
(n=75) at 1280x720 on Broadwell GT3.
Haswell has this feature as well, but we can't currently write registers
from userspace batches (and we'd incur additional software batch
scanning overhead as well), so we haven't enabled it. Broadwell allows
us to write CACHE_MODE_1. Backporters beware: the formula and flushing
incantation differs between Haswell and Broadwell.
v2: Move pma_stall_bits from brw->state to brw itself (requested by
Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Matt requested this in review feedback on the original patch, which I
completely missed when pushing this series. Kristian also made this
change, but I grabbed the wrong version of the patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise, calling glPopAttrib on drivers that don't support
ARB_clip_control gives you a GL error, which is surprising at best.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We're not programming the clear values yet, so this won't work.
This patch should be (effectively) reverted eventually.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
On Skylake, the MOCS bits are an index into a table of 63 different,
configurable cache configurations. As for previous GENs, we only care about
WB and WT, which are available in the documented default set. Define
SKL_MOCS_WB and SKL_MOCS_WT to the indices for those configucations and use
those for the Skylake MOCS values.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has separate controls for enabling the Z Clip Test for the near
and far planes. For now, maintain the legacy behavior by setting both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake's 3DSTATE_DS packet has a few more fields; we don't support
domain shaders yet though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
They are the same as for BDW, so just add a case for SKL to the init switch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On SKL, 3DSTATE_CONSTANT_* command is not committed until we give
the corresponding 3DSTATE_BINDING_TABLE_POINTERS_* command. If we
fail to do so, the constant buffers wont be read and push constants
will be wrong.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Otherwise they overlap and horrible things happen. All the new DWords
are for fast color clear values, which we don't do yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We will need to allocate more DWords on Skylake.
v2: Don't mark brw_context parameter const. It's modified.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake introduces a new base address for a feature we don't yet expose.
Setting these to 0 should be safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake uploads the stencil reference values in DW3 of the
3DSTATE_WM_DEPTH_STENCIL packet, rather than in COLOR_CALC_STATE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has some extra bits in PIPELINE_SELECT, none of which are
interesting for a 3D driver. In order to selectively change them, it
also introduces new "mask bits" in 15:8. We care about the "Pipeline
Selection" bits (1:0), so set the mask to 0x3.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This commands has seen the addition of 2 dwords that allow to specify
which channels of which attributes need to be forwarded to the fragment
shader.
v2: Rebase forward a year (done by Ken).
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The CSE pass now prints out why it thinks a value is not a candidate for
adding to the AE set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The optimization in commit d056863b covers these cases, which were the
first optimizations I added to the GLSL compiler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should trigger CL_INVALID_VALUE if device_list is NULL and num_devices
is greater than zero.
Introduced by e5468dfa52
Reported by: EdB
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Between release 3.2 and 3.3 LLVM stopped aligning properly when certain
conditions (no allocas, but large number of vectors causing spills to
the stack, and frame pointer omission enabled).
We were already disabling frame-pointer-omission on several build types,
but we now disable it on all build types.
It's not clear whether this affects 32-bits x86 processes only, or if it
can also affect 64-bits x86_64 processes when AVX registers are
available and used. So disable frame-pointer-omission on both
x86/x86_64 to be on the safe side.
See also:
- http://llvm.org/PR21435
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
AFAICT the number of threads is 80, not 70. I am not sure if Ken knows
something I do not.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Need to do a sqrt().
FWIW, the html that Sphinx 1.1.3 generates for the math expressions
looks completely broken.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Pass and return tgsi_token buffers instead of pipe_shader_state.
And update softpipe driver (the only user of this function).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes polygon stipple if both DO_PSTIPPLE_IN_DRAW_MODULE and
DO_PSTIPPLE_IN_HELPER_MODULE are zero/off.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This was a regression introduced by
611d66fe45
Passing a binary program to clBuildProgram() is legal, but passing one
to clCompileProgram() is not.
v2:
- Code cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds a query which allows drivers to access the config
information of a specific function within the LLVM generated ELF
binary. This makes it possible for the driver to handle ELF
binaries with multiple kernels / global functions.
We are about to change mesa to spawn threads for deferred glCompileShader and
glLinkProgram, and we need to make sure those threads can send compiler
warnings/errors to the debug output safely.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There may be two contexts compiling shaders at the same time, and we want the
anonymous struct id to be globally unique.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_strtod and _mesa_strtof may be called from multiple threads. They need
to be thread-safe.
v2: platform checks are now done in configure.ac
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the assumptions that xlocale.h implies newlocale and strtof_l. SCons is
updated to define HAVE_XLOCALE_H on linux and darwin.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both core mesa and glsl have their own wrappers for strtof_l. Merge
and move them to util/. They are compiled with a C++ compiler so that
we can make them thread-safe in a following commit.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whiteacpe.org>
This reverts commit cabc93c5ad.
Mark thinks the failures on the SNB GT2 in the lab are actually because
of faulty hardware, not instruction compaction. The GT1 didn't see any
problems after changes to the compaction code.
Helps a small number of vertex shaders in the games Dungeon Defenders
and Shank, as well as an internal benchmark.
instructions in affected programs: 2801 -> 2719 (-2.93%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use the UST value provided in the PRESENT_COMPLETE_NOTIFY event
rather than gettimeofday(), which gives us the presentation time
instead of the time when SwapBuffers was called. Suggested by
Keith Packard. This relies on the fact that the X DRI3/Present
implementations use microseconds for UST.
v3: Properly ignore PresentCompleteKindMSCNotify; multiply in 64 bits
(caught by Keith Packard).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com> [v3]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
These source files support actual geometry shaders, so using "gs" for
the name makes a lot of sense. We're going to be adding SIMD8 geometry
shader support as well, at which point "vec4_gs" will be a misnomer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The brw_gs.[ch] and brw_gs_emit.c source files contain code for
emulating fixed-function unit functionality (VF primitive decomposition
or SOL) using the GS unit. They do not contain code to support proper
geometry shaders.
We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key,
brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile,
brw_ff_gs_prog). So it makes sense to make the filenames match.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The GL functions and driver hooks use corresponding names---for example,
glMapBufferRange and Driver.MapBufferRange. But our implementation was
called "intel_bufferobj_map_range," which has the words "map" and
"buffer" swapped, as well as randomly adding "obj."
FlushMappedBufferRange was even trickier: it ordered the words
3, "obj", 1, 2, 4: intel_bufferobj_flush_mapped_range.
Even though the old names were consistent, I always had trouble
rearranging the jumble of words when searching for a function,
and it took a few tries to eventually land there.
The new names match the word order of GL and the driver hooks;
FlushMappedBufferRange is simply brw_flush_mapped_buffer_range.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The OpenGL 4.0 core profile specification, section 2.17.3
Transform Feedback Draw Operations says:
"The error INVALID_VALUE is generated if <stream> is greater
than or equal to the value of MAX_VERTEX_STREAMS.
...
The error INVALID_OPERATION
is generated if EndTransformFeedback has never been called
while the object named by id was bound."
Fixes the piglit test:
ARB_transform_feedback3/arb_transform_feedback3-draw_using_invalid_stream_index
(with the test itself fixed to eliminate an unrelated failure)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we're checking if the framebuffer is sRGB capable, call
is_format_supported() with the PIPE_BIND_DISPLAY_TARGET flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit 20836c8185.
255 is a huge number. If you have a loop with 255 iterations, unrolling it
will exceed the SM3 instruction limit. Let's use the default again.
The comment about a SM3 limit doesn't make sense. For SM3, we generally
want 32 (default) or a lower number due to the SM3 instruction limit, which
is 512 instructions. For SM4, we can try higher numbers if needed, but
some shaders can end up being pretty huge and shader compilation can take
more time.
This fixes a shader compile failure on R500/SM3. Reported on IRC.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
The GL side of this extension just provides an accessor via glGetIntegerv for
the value of GL_CONTEXT_RELEASE_BEHAVIOR so it is trivial to implement. There
is a constant on the context for the value of the enum which is initialised to
GL_CONTEXT_RELEASE_BEHAVIOR_FLUSH. The extension is always enabled because it
doesn't need any driver interaction to retrieve the value.
If the value of the enum is anything but FLUSH then _mesa_make_current will
now refrain from calling _mesa_flush. This should only affect drivers that
explicitly change the enum to a non-default value.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The main incentive to do this is to get the defines for the
GL_KHR_context_flush_control extension.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Different platforms require the offset to be in different units. However,
the generator fixes all of this up for us and only requires an offset in
bytes. Previously, we were getting this wrong all over the place. Some
computed/used it correctly as bytes while others treated the offset as
whole registers or computed it as bytes or bytes*2 in SIMD16 mode. This
commit cleans all this up and makes us properly treat it as bytes
everywhere.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I thought this would be a clever way to make spilling less expensive.
However, it appears that the oword read/write messages we are using for
spilling ignore the execution size and assume SIMD16 whenever working with
more than one register.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Fanin (merge) nodes require it's srcs to be "adjacent" in consecutive
scalar registers. Keep track of instruction neighbors in copy-
propagation step and avoid eliminating mov's which would cause an
instruction to need multiple distinct left and/or right neighbors.
This lets us not fall on our face when we encounter things like:
1: MOV TEMP[2], IN[0].xyzw
2: TEX OUT[0].xy, TEMP[2], SAMP[0], SHADOW2D
3: MOV TEMP[2].xy, IN[0].yxzz
4: TEX OUT[0].zw, TEMP[2], SAMP[0], SHADOW2D
5: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Always insert extra mov's for the tex coord into the fanin. This
simplifies things a bit, and avoids a scenario where multiple sam
instructions can have mutually exclusive input's to it's fanin, for
example:
1: TEX OUT[0].xy, IN[0].xyxx, SAMP[0], 2D
2: TEX OUT[0].zw, IN[0].yxxx, SAMP[0], 2D
The CP pass can always remove the mov's that are not actually needed,
so better to start out with too many mov's in the front end, than not
enough.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
dscis -> noscis
dbypass -> nobypass
a bit more consistant w/ nobin, etc. And IMO a bit more sensible names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kills get added to the outputs list, to ensure they get scheduled. But
they aren't *really* outputs so skip them in the header comment block.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adding this makes 'make check' catch failures introduced from
within ARB_clip_control.xml earlier.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In order to test compiler changes more easily, spit out the assembled
shader with some header information so that we can know about
inputs/outputs more easily.
See: git://people.freedesktop.org/~robclark/ir3test
In ir3test we have a big collection of tgsi shaders and reference
ir3_compiler outputs. When making compiler changes, regenerate the
compiler outputs and feed to ir3test to compare the new vs reference
shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The last few dwords were skipped if the total number of dwords was not a
multiple of 4. Change the formatting for better readability.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We only get here if the VS/GS compiled programs change, but we can even
skip it if the VS/GS size didn't change.
Affects cairo runtime on glamor by -1.26471% +/- 0.674335% (n=234)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
brw_program.c: In function 'brw_dump_ir':
brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Originally I just fixed some unused parameter warnings in this
function. However, Ken pointed out:
"You could instead remove this driver hook. If the dd pointer is
NULL, arbprogram.c will return true. I think I'd prefer that."
Way, way back in time, I think _mesa_GetProgramivARB had the opposite
behavior. Given that it works the way it now works, I also prefer
removing the driver hook.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/uniform_query.cpp:1062:1: warning: unused parameter 'ctx' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Silences:
../../src/mesa/main/shaderobj.c: In function '_mesa_init_shader_program':
../../src/mesa/main/shaderobj.c:239:46: warning: unused parameter 'ctx' [-Wunused-parameter]
For now, this adds a couple other unused parameter warnings, but future
patches will clean those up.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_link_shader_program already calls _mesa_clear_shader_program_data
before calling link_shaders, so this is already done.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
_mesa_UseShaderProgramEXT, _mesa_ActiveProgramEXT, and
_mesa_CreateShaderProgramEXT were all removed when support for
GL_EXT_separate_shader_objects was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/ff_fragment_shader.cpp:668:1: warning: unused parameter 'p' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were allowing the register allocation code to do the
computation for us in ra_set_finalize. However, the runtime for this
computation is O(c^4 * g) where c is the number of classes and g is the
number of GRF registers. However, these q-values are directly computable
based on the way we lay out our register classes so there is no need for
the aweful runtime algorithm.
We were doing ok until commit 7210583eb where we bumped the number of
register classes from 11 to 16. While startup times don't normally matter,
this caused piglit to take 4 times as long to run on Bay Trail. This patch
should make generating the ra_set much faster and melt the piglit run
times.
v2: Fixed a couple of bugs. I have now verified that the same q-values are
generated both ways.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On older GENs in SIMD16 mode, we were accidentally building too much
interference into our register classes. Since everything is divided by 2,
the reigster allocator thinks we have 64 base registers instead of 128.
The actual GRF mapping still needs to be doubled, but as far as the ra_set
is concerned, we only have 64. We were accidentally adding way too much
interference.
Signed-off-by: Jason Ekstrand <jason.ekstrand@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For GEN6 SIMD16 mode, we have to 2-align all the registers, so we only have
the even-numbered ones. This means that we have to divide the register
number by 2 when we precolor. This wasn't a problem before because we were
setting up the interference between ra_node registers wrong. This will be
fixed in the next commit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This shouldn't be a functional change since reg_belongs_to_class is just a
wrapper around BITSET_TEST. It just makes the code a little easier to
read.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium should be prepared fine for ARB_clip_control.
So enable this and mention it in the release notes.
v2:
Only enable for drivers announcing the freshly introduced
PIPE_CAP_CLIP_HALFZ capability.
v3:
Use extension enable infrastructure to connect PIPE_CAP_CLIP_HALFZ
with ARB_clip_control.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In preparation of ARB_clip_control. Let the driver decide if
it supports pipe_rasterizer_state::clip_halfz being set to true.
v3:
Initially enable on ilo.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de
Restore clip control to the default state if MESA_META_VIEWPORT
or MESA_META_DEPTH_TEST is requested.
v3:
Handle clip control state with MESA_META_TRANSFORM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Implement the mesa parts of ARB_clip_control.
So far no driver enables this.
v3:
Restrict getting clip control state to the availability
of ARB_clip_control.
Move to transformation state.
Handle clip control state with the GL_TRANSFORM_BIT.
Move _FrontBit update into state.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is for preparation of ARB_clip_control.
v3:
Add comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This allows vc4_opt_cse.c to CSE-away operations involving the same
uniform values.
total instructions in shared programs: 37341 -> 36906 (-1.16%)
instructions in affected programs: 10233 -> 9798 (-4.25%)
total uniforms in shared programs: 10523 -> 10320 (-1.93%)
uniforms in affected programs: 2467 -> 2264 (-8.23%)
This saves a bunch of extra flushes when texsubimaging a whole texture
that's been used for rendering, or subdataing a whole BO. In particular,
this massively reduces the runtime of piglit texture-packed-formats (when
the probes have been moved out of the inner loop).
I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but
when debugging regressions, I want to match shaderdb output to shader
disassembly.
fd_bo_cpu_prep() doesn't realize the bo is already referenced in
unflushed cmdstream. It could be made to do so (but would have to be
implemented twice, ie. both for msm and kgsl). But we still can't do
the expected thing if the caller isn't using _NOSYNC. Because of the
way the tiling works, we need to build quite a bit of cmdstream at flush
time, which is not possible to do at the libdrm level.
So rather than trying to make fd_bo_cpu_prep() smarter than it can
possibly be, just *always* discard and reallocate if the
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
- Use macro for munmap under Android - the STATIC_ASSERT uses
a off_t which is not used under Android for mmap. As loff_t size
does not vary as does off_t just ignore the assert.
- Wrap the long lines to improve readability.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The setMCJITMemoryManager method doesn't exist in LLVM 3.3.
I thought I had tested the latest version of my earlier change with LLVM
3.3, but it looks I missed it.
Trivial.
JITMemoryManager was removed in LLVM 3.6, and replaced by its base class
RTDyldMemoryManager.
This change fixes our JIT memory managers specializations to derive from
RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
This enables llvmpipe to run with LLVM 3.6.
However, lp_free_generated_code is basically a no-op because there are
not enough hook points in RTDyldMemoryManager to track and free the code
of a module. In other words, with MCJIT, code once created, stays
forever allocated until process destruction. This is not speicfic to
LLVM 3.6 -- it will happen whenever MCJIT is used regardless of version.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We'll need to update gallivm for the interface changes in LLVM 3.6, and
the fewer the number of older LLVM versions we support the less hairy that
will be.
As consequence HAVE_AVX define can disappear. (Note HAVE_AVX meant
whether LLVM version supports AVX or not. Runtime support for AVX is
always checked and enforced independently.)
Verified llvmpipe builds and runs with with LLVM 3.3, 3.4, and 3.5.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The functionality exposed by those extensions does not appear in ES3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the user calls util_blitter_cache_all_shaders() set a flag and assert
that we never try to create any new fragment shaders after that point.
If the assertions fails, it means we missed generating some shader in
util_blitter_cache_all_shaders().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g6<1>D g1<0,4,1>F 0F
cmp.nz.f0(8) null g6<4,4,1>D 0D
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g1<0,4,1>F 0F
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
No difference in shader-db.
v2: Remember to delete the old code (thanks Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using dst_reg(this, ir->type) automatically sets the writemask to the
proper size for the type; src_reg(dst_reg) preserves that. This should
be equivalent, but less code.
Note that src_reg(dst_reg) either uses SWIZZLE_XXXX or SWIZZLE_XYZW, so
the old code did need the manual writemask adjustment, since it
constructed the registers the other way around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing uses the vector_elements temporary variable.
Setting this->result.file is dead because we overwrite this->result a
few lines later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g4<1>D g2<0,1,0>F 0F
cmp.nz.f0(8) null g4<8,8,1>D 0D
(+f0) sel(8) g120<1>F g2.4<0,1,0>F g3<0,1,0>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g2<0,1,0>F 0F
(+f0) sel(8) g124<1>F g2.4<0,1,0>F g3<0,1,0>F
total instructions in shared programs: 5473459 -> 5473253 (-0.00%)
instructions in affected programs: 6219 -> 6013 (-3.31%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A while back, Matt made the uniform upload functions simply upload
ctx->Const.UniformBooleanTrue for boolean values instead of 0/1, which
removed the need to convert it later. We also set UniformBooleanTrue to
1.0f for drivers which want to treat booleans as 0.0/1.0f.
Nothing ever sets these, so they are dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't have a scissor enable bit in hw, so when a raster state change
results in scissor enable bit changing, we need to also mark scissor
state as dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The optimization of avoiding restore (mem2gmem) if there was a clear
falls down a bit if you don't have a fullscreen scissor. We need to
make the decision logic a bit more clever to keep track of *what* was
cleared, so that we can (a) completely skip mem2gmem if entire buffer
was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that
were completely cleared.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
DataLayoutPass was added in LLVM 3.5 r202168, commit
57edc9d4ff1648568a5dd7e9958649065b260dca "Make DataLayout a plain
object, not a pass.".
This patch fixes this build error with LLVM 3.4.
CXX llvm/libclllvm_la-invocation.lo
llvm/invocation.cpp: In function 'void {anonymous}::optimize(llvm::Module*, unsigned int, const std::vector<llvm::Function*>&)':
llvm/invocation.cpp:324:18: error: expected type-specifier
PM.add(new llvm::DataLayoutPass(mod));
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85189
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The values are hardcoded in the LLVM backend, but the TGSI definitions are
going to be changed with tessellation, e.g. TGSI_PROCESSOR_COMPUTE will be
increased by 2.
We'll use VS for LS and HS, because there's nothing special about them
from the LLVM backend point of view, even though the hardware side is
different. We do the same for ES.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I'll need indexed loads without the meta data flag for tessellation later.
Also rename load_const to buffer_load_const to distinguish it from indexed
const loads.
v2: add comments
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
We must convert it to boolean from the DX9 float encoding that Gallium
specifies.
Later, we should probably define that FACE should be 0 or ~0 if native
integers are supported.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With 5 shader stages and various combinations of enabled and disabled shaders,
the maximum number of outputs in one shader doesn't have to be equal to
the maximum number of inputs in the following shader.
v2: return 32 for softpipe and llvmpipe
If the writemask doesn't compress, then we want to put in the uncompressed
writemask, not the compressed writemask failure value (all-on).
Fixes glean's stencil2 and fbo-clear-formats on stencil.
It seems like the hardware is unhappy if we execute a kill instruction
prior to last input (ei). Probably the shader thread stops executing
and the end-input flag is never set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If app only updates (for example) vertex uniforms, it would be nice to
only re-emit those and not also frag uniforms. Means we need to mark
the first frag shader const buffer dirty after a clear.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The get_variable_being_redeclared() function can free the 'var' argument.
Thereafter, we cannot assume that 'var' is a valid pointer. This patch
replaces 'var->name' with 'earlier->name' in two places and calls
is_gl_identifier(var->name) before 'var' might get freed.
This fixes several piglit GLSL crashes, including:
spec/glsl-1.50/execution/geometry/clip-distance-in-param
spec/glsl-1.50/execution/geometry/clip-distance-bulk-copy
spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-before-global-redeclaration.geom
I'm not sure why these were not spotted sooner.
A similar bug was previously fixed by f9cecca7a.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Patch fixes 'glsl-2types-of-textures-on-same-unit' in WebGL conformance
test suite. No Piglit regressions, fixes gl-2.0-active-sampler-conflict.
To avoid adding potentially heavy check during draw (valid_to_render),
check is done during uniform updates by inspecting TexturesUsed mask.
A new boolean variable is introduced to cache validation state.
v2: take into account case where 2 uniforms use same unit (curro)
also do the check only when SSO is not in use, SSO has own
path for sampler validation.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Patch removes old variable based logic for handling a break inside
switch. Switch is put inside a loop so that existing infrastructure
for loop flow control can be used for the switch, now also dead code
elimination works properly.
Possible 'continue' call inside a switch needs now special handling
which is taken care of by detecting continue, breaking out and calling
continue for the outside loop.
v2: remove one unnecessary ir_expression (Curro)
Fixes following Piglit tests:
fs-exec-after-break.shader_test
fs-conditional-break.shader_test
No Piglit or es3conform regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLES2 doesn't have GL_TEXTURE_BASE_LEVEL, so the hardware doesn't. Fixes
piglit levelclamp, tex-miplevel-selection, and texture-storage/2D mipmap
rendering.
Suppresses a bunch of warning noise about sample_map possibly being used
uninitialized.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Before, we used the a signed d-word for booleans and the immedates we
emitted varried between signed and unsigned. This commit changes the type
to unsigned (I think that makes more sense) and makes immediates more
consistent. This allows copy propagation to work better cleans up some
instructions.
total instructions in shared programs: 5473519 -> 5465864 (-0.14%)
instructions in affected programs: 432849 -> 425194 (-1.77%)
GAINED: 27
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_SampleID is a built-in variable that always is of type "int".
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Notably this included the EOF flag (the other bits are the full buffer
dump selection, but we don't do full dumps), which caused the kernel
checking for frame completion to trigger.
The other driver does this manually before calling into each tile, but we
can just let it get binned into the tiles (saving repeated kernel
validation on the packet).
Fixes simulator assertion failures on polygon-mode and non-auto texwrap.
We don't need to emit all of our current state at the end of each bin
list. We're going to be smashing it all at the start of the next tile's
bin list, anyway.
The trick is to generate a unique buffer usage value for each possible
combination of domains and flags, with only one bit set each for the
domains and flags. This ensures pb_check_usage() only returns TRUE when
the domains and flags the cached buffer was created for exactly match
the requested ones.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are two debug variables:
CLOVER_DEBUG which you can set to any combination of llvm,clc,asm
(separated by commas) to dump llvm IR, OpenCL C, and native assembly.
CLOVER_DEBUG_FILE which you can set to a file name for dumping output
instead of stderr. If you set this variable, the output will be split
into three separate files with different suffixes: .cl for OpenCL C,
.ll for LLVM IR, and .asm for native assembly. Note that when data
is written, it is always appended to the files.
v2:
- Code cleanups
- Add CLOVER_DEBUG_FILE environment variable for dumping to a file.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Split build_module_native() into three separate functions.
- Code cleanups.
v3:
- More cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Drivers can return this value for PIPE_COMPUTE_CAP_IR_TARGET
if they want clover to give them native object code.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
NewBufferObject took a "target" parameter, which it blindly passed to
_mesa_initialize_buffer_object(), which ignored it.
Not much point in passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is used to implement GLSL's atomicCounter() intrinsic. Previously
it *worked*, but the disassembly was bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This would have *almost never* actually been an issue, since other state
tends to get flagged at the same time as new ABOs -- but still bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This didn't make any sense, but papered over the missing TexBO flagging
we've just fixed, in a bunch of cases.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When a buffer object is bound to one of the indexed uniform buffer
binding points, assume that from that point on it may be used as
a uniform buffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In the drivers, we occasionally want to reallocate the backing
store for a buffer object; often to avoid waiting for the GPU
to be finished with the previous contents.
At the point that happens, we don't have a good way of determining
where else the buffer object may be bound, and so no good way of
determining which dirty flags need to be raised -- it's fairly
expensive to go looking at all the possible binding points.
Until now, we've considered any BO to be possibly bound as a UBO or
TexBO, and flagged all that state to be reemitted.
Instead, remember what kinds of binding point this buffer has ever
been used with, so that the drivers can flag only what they need.
I don't expect these bits to ever be reset, but that doesn't matter
for reasonable apps.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kernel files are built into a separate static library and
all the functions that require it are already wrapped in ifdef
USE_VC4_SIMULATOR. Don't forget the header file :)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we've made all the texture emit code mostly independent of GLSL
IR, this isn't necessary any more.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Before, we had 3 different emit functions for various different gen's,
as well as some ancilliary work that was the same across all gen's which
was either contained in functions or duplicated across the GLSL IR and
Mesa IR backends. Now, we have a single method, emit_texture(), that
takes all the information needed to make a texture instruction and
handles all the setup, and all we have to do to emit a texture
instruction while converting from GLSL IR, Mesa IR, or any new backend
is to extract the information emit_texture() needs and then call it.
v2: Significant rebasing (by Ken).
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This happened to work before, but it would convert the output to a float
and then back to an integer which seems bad.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This drops a dependency on ir_texture objects.
v2 (Ken): Rename lod_components to grad_components, as it only has a
meaningful value for ir_txd. We could set it to 1 for TXL,
but there's no real need.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Our new IR won't have ir_texture objects, but using glsl_type is fine.
v2 (Ken): Drop redundant ir->coordinate NULL check; rebase.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is slightly clearer. Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This moves the handling of non-constant texel offset subexpression trees
to the place where we visit other such subtrees. It also removes some
uses of ir->offset in emit_texture_gen7, which will be useful when we
write the backend for our new upcoming IR.
Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
brw_lower_unnormalized_offset sets ir->offset to NULL if it applies the
texelFetchOffset workarounds, so there's no need to special case it
here---there won't be an offset for ir_txf.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Eric's original code to work around TXF offset bugs contained a comment
explaining the problem, which was lost when Chris generalized it to an
IR transformation (in commit 598ca510b8).
This commit adds the original comment to the newer code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because we reuse various bits of emit code (for state/vertex/prog/etc)
for both regular draws and internal draws (gmem<->mem, clear, etc), the
number of parameters getting passed around has been growing. Refactor
to group these into fd3_emit. This simplifies fxn signatures, avoids
passing around shader key on the stack, etc. It also gives us a nice
place to cache shader-variant lookup to avoid looking up shader variants
multiple times per draw (without having to *also* pass them around as
fxn args everywhere).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+
math instruction: it's a single instruction (SEND) with a GRF source.
The difference is that it also implicitly clobbers a message register.
The only visible effect is that CSE will remove the MRF-clobbering from
later math operations. This should be fine; compute_to_mrf and
remove_redundant_mrf_writes don't look at the values populated by
implied writes, so they can't rely on those values being present.
Less interference may actually help those passes make more progress.
Binary math is still problematic, since it involves a separate MOV
instruction to load the second operand. We continue disallowing CSE for
binary math operations.
total instructions in shared programs: 3340303 -> 3340100 (-0.01%)
instructions in affected programs: 26927 -> 26724 (-0.75%)
Nothing hurt, gained, or lost. ~6% reduction on a few shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise the caching buffer manager may return a buffer which was created
with a different set of flags, which can cause trouble.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, when no pkg-config was available for
libexpat we would just add the needed linking
flags without any extra check.
Now, we check that the library and the headers are
also installed in the building environment.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Now that the freedreno_lowering code is moved to tgsi_lowering, remove
our private copy and switch over to using the common version.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Originally the variables were set only once via the ?= operator but
that causes issues when doing incremental builds. They appear to be
undefined and missing from the dependency list despite their addition
to LIBADD.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84807
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The texturing hardware takes the POT level 0 width/height and minifies
those. This is different from what we were doing, for example, for
273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16.
Fixes piglit-depthstencil-render-miplevels 273.
This patch fixes this build error on DragonFly BSD.
CC os/os_misc.lo
os/os_misc.c: In function 'os_get_total_physical_memory':
os/os_misc.c:132:2: error: #error Unsupported *BSD
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation of glxUseXFont requires creating
a temporary pixmap and graphics context, which requires a real
old-school X11 Window, not a glxDrawable. This patch changes
things so that glxUseXFont will also accept a glxWindow or
glxPixmap, and lookup the underlying X11 Drawable. Without
this patch glxUseXFont generates a giant stream of Xerrors
about bad drawables and bad graphics contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
It does not look like an issue now but it is good to be future proof. Spotted
by Courtney Goeltzenleuchter.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This gets them to pass glsl-sin/cos. There was an obvious problem that I
was using the FRC code on the scaled input value, which means that we had
a range in [0, 1], while our taylor is most accurate across [-0.5, 0.5].
We can just slide things over, but that means flipping the sign of the
coefficients. After that, it was just a matter of stuffing more
coefficients in.
There's no reason to stall on pwrite - the CPU always appends to the
buffer and never modifies existing contents, and the GPU never writes
it. Further, the CPU always appends new data before submitting a batch
that requires it.
This code predates the unsynchronized mapping feature, so we simply
didn't have the option when it was written.
Ideally, we would do this for non-LLC platforms too, but unsynchronized
mapping support only exists for LLC systems.
Saves a bunch of stall avoidance copies when uploading shaders.
v2: Rebase on changes to previous patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
We don't really want unnecessary buffer copying, so it'd be nice to know
when it's happening.
v2: Drop stall warnings when doing a read-only CPU mapping of the cache
BO. The GPU also uses it in a read-only fashion, so there won't be
any stalls, even though the buffer is busy. (Thanks to Chris Wilson
for catching this mistake.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
This is easy: we just need to use brw_map_bo instead of mapping it
directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding. And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).
Fixes 77 piglit tests.
Also fixes two sided lighting which was broken at least
on pre-evergreen by commit b1eb00.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is the last use tgsi_parse_token in radeonsi.
It looks ugly because the code was re-indented, but there is really no change
in behavior.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
They were reinventing tgsi_shader_info. They are unused now.
radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.
If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.
This is a prerequisite for the following commit.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
[ Francisco Jerez: Split off from a larger patch, and take a slightly
different approach for passing the implicit arguments around. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.
This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:
atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).
The polynomial that approximates atan(x) is:
x * 0.9999793128310355 - x^3 * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
This polynomial was found with the following GNU Octave script:
x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)
The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:
http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
This fixes the following piglit test:
shaders/glsl-const-folding-01
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping. Otherwise, we will most likely
crash.
Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases. libdrm probably has a hand in that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Copy propagating these might result in reading the r4 after some other
instruction has written r4. Just prevent all copy propagation of this for
now.
Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.
Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together). What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.
I was about to make the situation worse with indirect uniform buffer
access.
I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea. In the process, this fixes a swap of the s/t texture wrap modes.
Under the simulator, reading registers before writing them triggers an
assertion failure. c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction. We should
definitely not be aborting in this case, and return some sort of undefined
value instead.
Fixes glsl-user-varying-ff.
The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.
In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer. Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached. I originally chose WT for render targets because it works in
all cases. However, we really want to use write-back caching where
possible, as it is more efficient.
Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well. So
in most cases WB will work. However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.
This matches our MOCS choice on Haswell.
Fixes performance regressions since commit ee4484be3d
in a microbenchmark (spotted by Eero Tamminen). Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2. Improves performance in a bunch of other microbenchmarks
by ~15% or so.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.
This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The non-base NPOT levels are stored as POT-aligned images. We get that
POT alignment by minifying the POT-aligned base level.
This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.
Fixes the fbo-generatemipmap-formats NPOT cases. Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.
It's fairly easy, thanks to Rob Clark's lowering code. Fixes
two-sided-lighting and 4 vertex-program-two-side testcases, while
regressing 8 testcases that involve enabling two-sided color while only
initializing one of the two colors in the VS. If you're enabling two
sided color, it's of course expected that you really do set up both
colors, so this is still an improvement (and when we set up a linker for
TGSI, we'll hopefully fix those 8 fails).
Lots of drivers need to transform the weird instructions in TGSI into
reasonable scalar ops, and this code can make those translations
canonical.
Acked-by: Rob Clark <robclark@freedesktop.org>
The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything. By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).
Original patch by Petri Latvala <petri.latvala@intel.com>:
Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.
This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.
This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:
total instructions in shared programs: 75 -> 67 (-10.67%)
instructions in affected programs: 60 -> 52 (-13.33%)
GAINED: 0
LOST: 0
All tests (max3, min3, mid3) improved.
A full shader-db run:
total instructions in shared programs: 4293603 -> 4293575 (-0.00%)
instructions in affected programs: 1188 -> 1160 (-2.36%)
GAINED: 0
LOST: 0
Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of
dropping of a (constant float (0.00000)) operand, which was
compiled to a saturate modifier.
Version 2 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
(Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
(Suggested by Mat Turner)
- Change mixmax_range to call its limits "low" and "high" instead of
"range[0]" and "range[1]". (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
(Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
to prune its children. (Suggested by Connor Abbot).
Other changes:
- Instead of having a global "valid" visitor member, let the various functions
that can determine this status return a boolean and check for its value
to decide what to do in each case. This is more flexible and allows to
recurse into children of parents that could not be prunned due to invalid
ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
any use of get_range, combine_range or range_intersection can invalidate
a range we should check for this situation every time we use any of these
functions.
Version 3 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Now we can make get_range, combine_range and range_intersection static too
(suggested by Connor Abbot).
- Do not return NULL when looking for the larger or greater constant into
mixed vector constants. Instead, produce a new constant by doing a
component-wise minmax. With this we can also remove of the validations when
we call into these functions (suggested by Connor Abbot).
- Add a comment explaining the meaning of the baserange argument in
prune_expression (suggested by Connor Abbot).
Other changes:
- Eliminate minmax expressions operating on constant vectors with mixed values
by resolving them.
No piglit regressions observed with Version 3.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Patch fixes following test case from 'shaders-with-varyings' WebGL
conformance suite: "vertex shader with unused varying and fragment
shader with used varying must succeed"
v2: emit still a warning if the condition happens (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When a shader needs N surfaces, we should upload N surfaces and not depend on
how many are bound. This commit is larger than it should be because we did
not export how many surfaces a surface uses before.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It only needs the constant buffer with clip planes and read-write resources
for the GS->VS ring and streamout. That's 2 pointers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a wrong place to flush caches to say the least.
I don't think we need to flush the instruction caches if we don't patch
shaders with DMA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
st/mesa has the same flag in its shader key, we don't need to do it
in the driver anymore.
Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The first compiled shader is sometimes useless, because the key doesn't match
the key for the draw call where it's used.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes a few issues, including a potential empty-IB (which triggers gpu
hangs in piglit occlusion_query_meta_no_fragments)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Possibly we should map the front color to black (zeroes). But not sure
there is a way to do that without generating a shader variant.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments. Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.
The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.
Plus, improve the comments for _mesa_alloc_dispatch_table().
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want. I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
While running piglit in virgl, I hit an assert in intel driver.
"qemu-system-x86_64: intel_tex.c:219: intel_map_texture_image: Assertion `tex_image->TexObject->Target != 0x8C18 || h == 1' failed."
Thanks to Eric and Ken for pointing me in the right direction,
Fix the get_tex_depth to do the same fixup as get_tex_rgba does
for 1D array textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
With current makefiles the build fails because source and build paths
are generated incorrectly. With Android build system the top_srcdir and
top_builddir variables are undefined and all paths are relative to where
Android.mk is located. This ends up with path likes
external/mesa/src/mesa/src/mesa/ for both source and build paths, which
are obviously wrong.
This patch fixes this by overriding resulting SRCDIR and BUILDDIR
variables with empty string, so that paths end up being relative to
Android.mk file again. Appending correct build path to generated files
is already done in Android.gen.mk.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This patch fixes Android build failures by including src/util directory
in compilation. Files inside of this directory are compiled into
libmesa_util static library and linked with resulting libGLES_mesa.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Before, we were hard-coding the base_mrf based on dispatch width not number
of registers spilled at a time. This caused us to emit instructions with a
base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16
register. This fixes the problem.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes a bug where 1-wide operations don't properly translate down to
1-wide instructions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Above a certain limit use CACHE mode instead of BUFFER mode. This
should solve gpu hangs with large shader programs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The existing code was not setting several fields, most importantly the
target, which is required on nv50/nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patch fixes failing test in WebGL conformance test
'point-no-attributes' when running Chrome on OpenGL ES.
(Shader program may draw points using constant data in shader.)
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They're actually as documented in the HW specs and the GL mipmapping enums
order. Fixes fbo-generatemipmap-filtering , and some other tests where we
were off by a few bits due to unexpected linear filtering.
Extension enables doing a multisample buffer resolve and buffer
scaling using a single glBlitFrameBuffer() call. Currently, we
have this extension implemented in BLORP which is only used by
SNB and IVB. This patch implements the extension in meta path
which makes it available to Broadwell.
Implementation features:
- Supports scaled resolves of 2X, 4X and 8X multisample buffers.
- Avoids unnecessary shader compilations by storing the pre compiled
shaders for each supported sample count.
- Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and
GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed
behavior in the extension's spec.
- I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT
filter. It made the edges in the image look little smoother but
the image gets blurred causing no overall quality improvement.
For now I have dropped the idea of doing different filtering for
nicest filter.
V2:
- Minor changes to simplify the fragment shader.
- Refactor the code to move i965 specific sample_map computation out
of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized
by the driver.
- Use a simple msaa resolve shader for scaled resolves with scaling
factor = 1.0.
V3:
- Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x
variables and use it in fragment shader.
V4:
- Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x
variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
with values specific to Intel hardware.
V2: Define and use gen6_get_sample_map() function to initialize
the variables.
V3: Change the function name to gen6_set_sample_maps() and use
memcpy() to fill in the data.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
SampleMap{2,4,8}x variables are used in later patches to implement
EXT_framebuffer_multisample_blit_scaled extension.
V2: Use integer array instead of a string.
Bump up the comment.
V3: Use uint8_t type array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output
v2: fix buffer size for odd-sized image case
expose I420 format as well
v3: fix YUV 4:2:2 format data buffer size
cleanup I420 format exposure
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements codec for mpeg2 h264 and vc1,
populates codec parameters and pass them to HW driver.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements context managements, relate it HW driver,
functions for video surface managements, and functions for
application data memory buffer managements.
implemented functions:
vlVa(Create|Destroy)Context
vlVa(Create|Destroy|Put)Surfaces
vlVa(Create|Destroy)Buffer
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes
v2: fix missing profile with query
Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.
v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
detect libva version, and correctly set driver entrypoint name.
rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
add back targets/va/va.sym
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Break out these functions so that they can be shared with a other
state trackers. They will be used in subsequent patches for the new
VA-API state tracker.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not
needed here anyway - only SBE needs it. Just a copy and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function flagged BRW_NEW_*_PROGRAM
When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.
However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on. So, this looks
to be entirely redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31. We missed doing this when widening the driver flags
from uint32_t to uint64_t.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fix build error introduced with commit
eedbce9c63.
lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
gallivm = gallivm_create("test_module_unorm8");
^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
gallivm_create(const char *name, LLVMContextRef context);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.
When vblank_mode=0, if there are only three buffers we do:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC 2.
Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.
With four buffers, we get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC
2. The X server will queue this swap until buffer 1 becomes
the scanout buffer.
Render frame 3 to buffer 3
PresentPixmap for buffer 3 at MSC 1
As soon as the X server sees this, it will replace the pending
buffer 2 swap with this swap and release buffer 2 back to the
application
Render frame 4 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Now we're in a steady state, flipping between buffer 2 and 3
waiting for one of them to be queued to the kernel.
...
current scanout buffer = 1 at MSC 1
Now buffer 0 is free and (e.g.) buffer 2 is queued in
the kernel to be the scanout buffer at MSC 2
Render frames, flipping between buffer 0 and 3
When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting waiting for vblank to become the next scanout
buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Queue this for display at MSC 1
1. There are three possible results:
1) We're still before MSC 1. Buffer 1 is released,
buffer 2 is queued waiting for MSC 1.
2) We're now after MSC 1. Buffer 0 was released at MSC 1.
Buffer 1 is the current scanout buffer.
a) If the user asked for a tearing update, we swap
scanout from buffer 1 to buffer 2 and release buffer
1.
b) If the user asked for non-tearing update, we
queue buffer 2 for the MSC 2.
In all three cases, we have a buffer released (call it 'n'),
ready to receive the next frame.
Render frame 3 to buffer n
PresentPixmap for buffer n
If we're still before MSC 1, then we'll ask to present at MSC
1. Otherwise, we'll ask to present at MSC 2.
Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.
I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.
That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Need to unwrap the indirect resource otherwise bad things will happen.
Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.
Occasionally the sqrt's result is no longer used, leading to:
instructions in affected programs: 5005 -> 4949 (-1.12%)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The next patch adds an algebraic optimization for the pattern
sqrt a, b
rcp c, a
and turns it into
sqrt a, b
rsq c, b
but many vertex shaders do
a = sqrt(b);
var1 /= a;
var2 /= a;
which generates
sqrt a, b
rcp c, a
rcp d, a
If we apply the algebraic optimization before CSE, we'll end up with
sqrt a, b
rsq c, b
rcp d, a
Applying CSE combines the RCP instructions, preventing this from
happening.
No shader-db changes.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Helps a handful of programs in Serious Sam 3 that use do-while loops.
instructions in affected programs: 16114 -> 16075 (-0.24%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Indirect registers consume an additional token. Try to clean up the
token calculation math a bit, and fix it at the same time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These few places were using ir_var_auto for seemingly no reason. The
names were not added to the symbol table.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No change Valgrind massif results for a trimmed apitrace of dota2.
v2: Minor rebase on _mesa_init_constants changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Later patches will give every ir_var_temporary the same name in release
builds. Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
After compilation (and before linking) we can eliminate quite a few
built-in variables. Basically, any uniform or constant (e.g.,
gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can
be eliminated. System values, vertex shader inputs (with one
exception), and fragment shader outputs that are not used and not
re-declared in the shader text can also be removed.
gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in
function ftransform. There are some complications with eliminating
these variables (see the comment in the patch), so they are not
eliminated.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit.
v2: Don't remove any built-in with Transpose in the name.
v3: Fix comment typo noticed by Anuj.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reuse the LLVMContext already allocated in llvmpipe_context
for draw_llvm if ppossible. This should decrease the memory
footprint of an llvmpipe context.
v2: Fix compile with llvm disabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This fixes the remaining problem with the recently introduced
global jit memory manager. This change again uses a memory manager
that is local to gallivm_state. This implementation still frees
the majority of the memory immediately after compilation.
Only the generated code is deferred until this code is no longer used.
This change and the previous one using private LLVMContext instances
I can now safely run several independent OpenGL contexts driven
by llvmpipe from different threads.
v3: Rebase on llvm-3.6 compile fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is one step to make llvmpipe thread safe as mandated by the OpenGL
standard. Using the global LLVMContext is obviously a problem for
that kind of use pattern. The patch introduces two LLVMContext
instances that are private to an OpenGL context and used for all
compiles. One is put into struct draw_llvm and the other
one into struct llvmpipe_context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
The big pile of patches I just pushed regresses about 25 piglit tests on
SNB. This fixes the regressions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
The screen argument isn't actually used by lp_jit_screen_init() at this
time, but let's move the call so that we pass a valid pointer.
v2: don't leak screen if lp_jit_screen_init() fails.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For cube resources, the array_size value should be 6. So handle
that case as we do for array texture resources. But assert that
array_size==6 just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
layers (aka array_size) should be 6 for cube textures so we don't need
to special-case it. But add an assertion just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so
there's no reason to special-case the pt.array_size = layers assignment.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The core sw primitive restart code is still around, because i965 uses it
in some cases, but there are no drivers that want it on all the time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
The drivers not flagging primitive restart support are r300 swtcl, svga,
nv30, and vc4.
The point of primitive restart is to slightly reduce draw call overhead
for apps by batching multiple draws. If we do an extra pass to read the
index buffer and split back into multiple draws, we've entirely missed the
point. This is particularly bad for drivers that otherwise have hardware
IB reads, where the readback is probably uncached.
Reviewed-by: Rob Clark <robdclark@gmail.com>
On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF. This commit enables that
functinoality for FB writes.
v2: Make handling of components more sane.
i965/fs: Force a high register for the final FB write
v2: Renamed the array for the range mappings and added a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then
we will need this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were use the base_mrf parameter of fs_inst to store the MRF
location. In preparation for doing FB writes from the GRF, we now also
allow you to set inst->base_mrf to -1 and provide a source register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.
i965/fs: Make effective_width a variable instead of a function
i965/fs: Preserve effective width in constant propagation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.
i965/fs: Explicitly set instruction execute size a couple of places
i965/blorp: Explicitly set instruction execute sizes
Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths. While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.
This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.
Shader DB results:
total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A switch statement is much easier to read/edit than a big giant or
statement.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.
This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <mark.a.janes@intel.com>
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Get rid of the 'default' case (as suggestied by imirkin) so compiler
warns us about missing caps. Also add some caps that were missing until
now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
4155d1c7 'st/mesa: drop dependence on API profile in st_init_extensions'
broke freedreno because somehow 'PIPE_CAP_MAX_VIEWPORTS' fell through
the cracks. Resulting that we reported zero viewports. So the state
tracker never bothered to give us any valid viewport!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is the only guaranteed way get the patch level for llvm,
since the define cannot always be found in config.h depending
on the version of llvm or the build system used.
CC: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
any more since the removal of the static makefiles.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than having double negatives -> disable-opencl, default=no
simply use enabled/disabled. It makes things a bit easier for the
reader and consistent throughout the file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The location of the egl driver(s) is matter that we should have
never exposed to the user. Currently the dri2 driver is built
into the libEGL loader, with the gallium based one soon to follow.
v2: Fold EGL_DRIVER_INSTALL_DIR within the makefiles. Suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80615
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is not used outside the render code. There are also too many details in it
that we do not want other components to access directly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Remove emit_draw() and ILO_RENDER_DRAW indirections. With all emit functions
being direct now, ilo_render_estimate_size() and more can also be removed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add these new high-level functions
ilo_render_get_draw_dynamic_states_len()
ilo_render_emit_draw_dynamic_states()
ilo_render_get_rectlist_dynamic_states_len()
ilo_render_emit_rectlist_dynamic_states()
ilo_render_get_draw_surface_states_len()
ilo_render_emit_draw_surface_states()
for draw and rectlist state emission. They are implemented in the new
ilo_render_dynamic.c and ilo_render_surface.c.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
For all supported query types, we always emit a PIPE_CONTROL. Call
ilo_render_get_flush_len() for simplicity and clarity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
ilo_render is based on ilo_builder. We should only care if the builder
buffers are invalidated, or if the hardware context is invalidated. Replace
ilo_render_invalidate() with flags by ilo_render_invalidate_builder() and
ilo_render_invalidate_hw().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Both dynamic buffer and surface buffer are state buffers. We should not use
state buffer to refer to the former.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move members of ilo_3d that still make sense to ilo_context. With ilo_3d
gone, rename functions whose names begin with ilo_3d to something more
appropriate.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to GLSL(4.2) and GLSL-ES (1.0, 3.0) spec, Structures must
have the same name to be considered same type. We currently ignore
the name check while checking if two records are same. This patch
fixes this.
Patch fixes failing tests in WebGL conformance test
'shaders-with-uniform-structs' when running Chrome on OpenGL ES.
v2: Do not force name comparison with unnamed types (Tapani)
v3: Cleanups (Matt)
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83934
The draw module would still try to use gallivm, causing many piglit tests
to fail with an assertion failure. llvmpipe might have been similarly
affected.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
mesa_texstore expects pixel data, not compressed data. For compressed
textures, we want to just copy the bits in without any conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
What happens is that a SPLIT operation is part of the spill node, and as
a pseudo op, the instruction gets erased after processing its first def.
However the later defs still need to refer to it, so instead delay
deleting until after that whole RA node is done processing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
These are pretty catastrophic, "should never happen" failure paths (though
4 tests in piglit hit them currently, due to a single bug). An abort()
that you can gdb on easily is probably more useful than a clean exit,
particularly since a bug in piglit framework right now is causing early
exit(1)s to simply not be recorded in the results at all.
I only made IS_NEGATIVE(x) use signbit in commit 0f3ba405 in an attempt
to fix 54805, but it didn't help. We didn't use signbit on some
platforms and instead defined it to x < 0.0f.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Was only tracked to be printed at the end of configure, but configure
quits if it can't build something we requested, rather than silently
dropping it, so printing these directories has little use.
Note that I had to add support for testing the packed attribute to
m4/ax_gcc_func_attribute.m4.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Presumbly this will let clang and other compilers use the built-ins as
well.
Notice two changes specifically:
- in _mesa_next_pow_two_64(), always use __builtin_clzll and add a
static assertion that this is safe.
- in macros.h, remove the clang-specific definition since it should
be able to detect __builtin_unreachable in configure.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec says the type must be W (JIP is 16-bits after all), but we've
been emitting it with a UD type all along and have experienced no
adverse effects. Changing the type to D allows ELSE and ENDIF
instructions to be compacted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We're currently emitting compactable control flow instruction the wrong
types, preventing their compaction. The next patch will fix this and
actually enable compaction.
On chips that cannot compact control flow instructions, attempts to find
a match in the datatype table will fail.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The array was previously indexed in units of brw_compact_inst (8-bytes),
but before compaction all instructions are uncompacted, so every odd
element was unused.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Used to pass over previously compacted instructions in this loop, but no
longer. No point in checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It may be possible to create a contrived example in which a 3-src
instruction would have been compacted on Gen < 8. I'd rather not
discover it in the wild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently a no-op, since instruction compaction isn't implemented for the
generations that have a programmable strips-and-fans unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Despite what the Sandybridge PRM says, ENDIF has Jump Count in <dst>,
not JIP in <src1>. (The same mistake appears about WHILE as well).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both sizes are VERT_ATTRIB_MAX, so this has no effect. But it drops a
few trivial uses of the derived state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not familiar with this code, but this sure appears to be a typo.
It looks like the intent is to set each array element, not arrays[0]
each time. Notably, the loop just below uses "array", not "arrays".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
The code in get.c that handles this uses ctx->Array.VAO->VertexAttrib,
which is a gl_vertex_attrib_array structure, not a gl_client_array.
The offsets of all fields happened to be the same in both structures, at
least on x86_64. "Size," "Type," and "Stride" are obviously the same:
both structures start with the same fields, in the same order.
"Enabled" is dicier: there are different fields before it in both
structures, including pointer sized values which might need special
alignment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
Dead since the _MaxElement removal, but these functions seemed generally
applicable, so I decided to remove them in a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
max_index was coming from either the user telling us as part of
glDrawRangeElements, or from an incidental calculation as part of some
sort of primitive conversion fallback. Sometimes, it was just set to the
default "I don't know" ~0 value.
If it wasn't set to the actual max index, then the kernel would reject the
draw call for allowing out-of-bounds VBO reads. So, compute the max index
from the sizes of the VBOs, which isn't too expensive (unlike mapping and
reading the index buffer) and is reliable.
Fixes piglit vao-element-array-buffer.
Still some open questions.. and at any rate, no additional piglit passes
due to various wrap modes that we need to emulate in at least some
cases :-(
But it does fix some mystery page-faults.. So add some comments in the
code where there are things that we need to emulate or do more r/e, and
push as-is.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead of separate color and Z/S writemasks, just have one writemask
parameter that takes a mask of the PIPE_MASK_[RGBAZS] flags.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
PIPE_TEX_MIPFILTER_x is not legal for the pipe_sampler_state::
min/mag_img_filter fields. But PIPE_TEX_MIPFILTER_x == PIPE_TEX_FILTER_x
so we were getting lucky.
This also makes the code consistent with u_blitter.c.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The register coalescing portion of this patch hurts three shaders in
Guacamelee by one instruction each, but examining the diff makes me
believe that what we were generating was (perhaps harmlessly) incorrect.
When the instructions aren't in a flat list, this wouldn't have worked.
Also, this should be faster.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only trick is changing a break into a return true in register
coalescing, since the macro is actually a double loop, and break will do
something different than you expect. (Wish I'd realized that earlier!)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This cleans up the debug flags to be consistently indented, use bit
shifting instead of hex-values and fixes a bug where the new DEBUG_NO8 flag
used the same value as the DEBUG_VUE flag. This was hidden by the numbers not
being aligned. Also removes gaps in the range where DEBUG_IOCTL (0x4) and
DEBUG_REGION (0x400) used to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
LLVM commit r218316 removes the JITMemoryManager class, which is
the parent for a seemingly important class in gallivm. In order to
fix the build, I've wrapped most of lp_bld_misc.cpp in
if HAVE_LLVM < 0x0306 and modifyed the
lp_build_create_jit_compiler_for_module() function to return false
for 3.6 and newer which effectively disables the gallivm functionality.
I realize this is overkill, but I could not come up with a simple
solution to fix the build. Also, since 3.6 will be the first release
without the old JIT, it would be really great if we could
move gallivm to use the C API only for accessing MCJIT. There
is still time before the 3.6 release to extend the C API in
case it is missing some functionality that is required by gallivm.
This fixes heap corruption. The sampler view can be bound in the context,
so we cannot call destroy directly.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, pass the layout of GS inputs in memory to the ES using the shader
key. Only 64 bits are needed to represent the layout in the key.
Mixing and matching different VS and GS shaders should now always work.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I will need this for fixing sample shading with 1 sample.
The good news is that all shader pm4 states no longer use the current context
state, so we can generate the pm4 states outside of draw_vbo if needed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Generic varyings in TGSI were based on the value of VARYING_SLOT_TEX0, so VAR0
was always GENERIC[22] (with tessellation patches). Some drivers might not
be able to cope with that.
This commit defines a proper mapping, so that PNTC is GENERIC[8] and VAR0 is
GENERIC[9].
Reviewed-by: Brian Paul <brianp@vmware.com>
The extensions and limits being set in the conditional block are core-only
anyway and don't have any effect on other profiles.
Reviewed-by: Brian Paul <brianp@vmware.com>
E.g. the 4.0 compatibility profile can be forced with:
MESA_GL_VERSION_OVERRIDE=4.0COMPAT
Some tests that I have require 4.0 compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The respective HAVE_{SOFT,LLVM}PIPE are already descriptive
enough. Additionally the svga modules does not really use either
one, but the auxiliary draw & gallivm modules.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The dri, vdpau, omx, xvmc and gbm targets don't need any authentication
even the VL ones never used it. Either the respective loader or the
library itself (vl) is doing its auth prior to calling create_screen()
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Make sure that MEGADRIVERS is set in order to create the hardlinks.
The variable name is not the most appropriate and will be sorted
out in upcoming commits.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
There are only 32 bits in the flatshade flags (which are 1 bit per
component), the simulator crashes when you use more than about this many
varyings, and the original Broadcom code drop only exposed 8 as well.
Fixes 26 piglit tests in the varying-packing group, and makes many others
go from crash to fail (due to not checking their varying counts and
treating link failures as failures). Regresses ARB_fp/minmax (due to 8
varyings instead of 10).
This is just the GL 1.1 flat shading of colors -- we don't need to support
TGSI constant interpolation bits, because we don't do GLSL 1.30.
Fixes 7 piglit tests.
They still provide register pressure since I haven't made a special class
for them, but since they're only live for one instruction it probably
doesn't matter.
This improves the readability of QPU assembly.
The A-file unpack is just like R4 unpack, except that if you don't do a
floating-point operation it won't do float conversion (so int16 gets
scaled up to int32).
This will let me more reliably allocate a-file registers, which are going
to be even more in demand when I start using a-file unpacks.
Also fixes a bug where the reservation of payload registers (FRAG_Z/W) was
off by one but just caused failure to register allocate at all if the
off-by-one was fixed.
The r300 gallium driver is using it outside of the Mesa tree, and I wanted
to do so for vc4 as well. Rather than make the multiple-definitions
problem even more complicated, just move it to more-shared code.
v2: Don't forget to delete the symlink in r300 (review by Matt).
Delete more r300-helper references (review by Emil)
Don't prefix util/ header inclusion with "util/" (review by Emil)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
ffeb77c7b0 had a typo which turned all signed
integer divisions into unsigned ones. Oops.
This gets us back the 51 little piglits
(all from glsl built-in-functions, fs/vs/gs-op-div-int-ivec2 and similar).
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If _mesa_get_tex_image() return NULL there is already error
set in context. Other error pats free allocated texture.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Check if _mesa_meta_bind_rb_as_tex_image() did give the texture.
If no texture was given there is already either
GL_INVALID_VALUE or GL_OUT_OF_MEMORY error set in context.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Running fast clear glClear with SNB caused Valgrind to
complain about this.
v2: line 237 fixed glClear from leaking memory, other
strdups are also now changed to ralloc_strdups but I
don't know what effect those have. At least no changes in
my Piglit quick run.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add current_pipe_control_dw1 and deferred_pipe_control_dw1 to track what have
been done since lsat 3DPRIMITIVE and what need to be done before next
3DPRIMITIVE. Based on them, we can emit WAs more smartly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It was used to set has_gen6_wa_pipe_control to false when the batch buffer
changed. When called from emit_flush() and others, it also unset
ILO_3D_PIPELINE_INVALIDATE_BATCH_BO so that the following emit_draw() will not
set has_gen6_wa_pipe_control to false again. It sounded error-prone and was
just ugly.
We should be able to achieve the same goal by reset has_gen6_wa_pipe_control
in ilo_3d_pipeline_invalidate(). With handle_invalid_batch_bo() gone, the
emit functions can also be inlined.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We do not need to call it from GEN7 pipeline anymore since software
PIPE_QUERY_PRIMITIVES_EMITTED is gone.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Previously we would get a potentially computed post-swizzle coord based
on the texture target info, which would not include the bias/lod in the
last argument.
The second argument does not have to be adjacent, so adjusting the order
array did not make sense.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will make life a lot easier as we add support for additional
instructions.
v2: shadow reference value is always .z or .w
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The only place the enum pipe_type was used is for the TGSI sampler
view return type. So make it a TGSI type. Note: it appears this
part of TGSI isn't used by anyone so it may be removed in the future.
v2: the new name is tgsi_return_type, not tgsi_type. This means we
can drop the previously posted tgsi_type -> tgsi_opcode_type patch.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Called when the user can insert new decls, instructions.
This could be used in a few places in the 'draw' module.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This helps:
1. Reduce the need to have fs_visitor::key's type be brw_wm_prog_key*
2. Align the code to allow brw_sampler_prog_key_data to be pulled out of other
prog_key types for different stages.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Conditional rendering is not limited to draw_vbo(). Move the support to
ilo_context, and replace ilo_3d_pass_render_condition() by
ilo_skip_rendering().
SOL_RESET happens before bo execution. It should not be observed by the
commands that are already in the bo.
Move the code out of the pipeline now that it submits.
And config query and DRM_CONF_SHARE_FD to both mega-driver and
traditional build configs, so that EGL_EXT_image_dma_buf_import
works.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We used different lists for different types of queries because we wanted to
update software queries quickly. Now that there is no software queries, we
are fine with a single list.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Read PIPE_QUERY_PRIMITIVES_GENERATED and PIPE_QUERY_PRIMITIVES_EMITTED from
hardware registers. Because all queries now have a bo, remove unnecessary
checks for q->bo.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add support for PIPE_QUERY_PRIMITIVES_GENERATED and
PIPE_QUERY_PRIMITIVES_EMITTED in ilo_3d_pipeline_emit_query().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces
ilo_3d_pipeline_emit_write_timestamp(),
ilo_3d_pipeline_emit_write_depth_count(), and
ilo_3d_pipeline_emit_write_statistics().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Make it own()'s responsibility to make room for release() and itself. To be
able to do that, allow ilo_cp_submit() in own().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move pipe states in ilo_context to the new ilo_state_vector. The motivation
is that ilo_context consists of several loosely related things. When we need
an ilo_context somewhere, we usually need only one or two of the things in it.
This change makes ilo_state_vector one such thing.
An immediate result is that we no longer need ilo_context in 3D pipelines,
something we have planned for since early days.
Tested on my snb-gt2:
4 tests skip->pass in spec/EXT_texture_array
51 tests skip->pass in spec.glsl-3.30
4 tests skip->pass in spec/!OpenGL 3.3
No regressions; no skip->fail changes.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some users don't understand that these variables can break OpenGL.
The general is rule is that if an app supports MSAA, you mustn't use
GALLIUM_MSAA.
For example, if an app has an 8xMSAA FBO and GALLIUM_MSAA=4
is set, resolving the FBO to the back buffer will be rejected which will look
like this on all gallium drivers:
http://www.phoronix.com/scan.php?page=article&item=amd_radeonsi_msaa
The environment variables also have no effect on modern apps like TF2, but
there is still a performance hit due to wasted bandwidth and VRAM.
In a nutshell, it does more harm than good.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
These instructions only have vex encodings, thus they can't be used without
avx. (Technically, one can still use avx-128 if avx isn't available because
the environment doesn't store the ymm registers, however I don't think llvm
can.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Geometry shaders was the only thing we needed to enable GLSL 1.50 and
OpenGL 3.2 in gen6.
v2: Layered clears do not work properly in gen6 with OpenGL 3.2. Kenneth
and Jordan realized that for this to work we also need
GL_AMD_vertex_shader_layered (which requires OpenGL 3.2, so it could not be
enabled before this patch), so we agreed to enable this together with
OpenGL 3.2 in this patch.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In gen6 we will use the geometry shader implementation from gen6_gs_visitor.cpp
and keep the implementation in brw_vec4_gs_visitor.cpp for gen7+. Notice that
gen6_gs_visitor inherits from brw_vec4_gs_visitor so it is not a completely
seprate implementation of geometry shaders.
Also, gen6 does not support multiple dispatch modes, its default operation mode
is equivalent to gen7's SINGLE mode, so select that in gen6 for consistency.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Uniforms declared as uniform blocks are stored in ubo surfaces and need to
be pulled from the geometry shader program so make sure we upload them first
and do the same for pull constants.
This fixes all piglit tests that use uniform blocks:
bin/shader_runner tests/spec/glsl-1.50/uniform_buffer/gs-*
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.
To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.
Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.
This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:
bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently we buffer transform feedack varyings separately. This patch makes
it so that we reuse the values we have already buffered for all the output
varyings of the geometry shader instead.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since geometry shaders can alter the value of varyings packed in the first
output VUE slot (PSIZ), we need to buffer it together with all the other
vertex data so we can emit the right value for each vertex when we do the
URB writes.
This fixes the following piglit test in gen6:
tests/spec/glsl-1.50/execution/redeclare-pervertex-out-subset-gs.shader_test
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Update gen6_gs_binding_table and gen6_sol_surface to use user-provided
geometry program information when present. This is necessary to implement
transform feedback support.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This takes care of generating code required to handle transform feedback.
Notice that transform feedback isn't enabled yet, since that requires
additional setups in other parts of the code that will come in later patches.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We will use this parameter in later patches to provide information relevant
to transform feedback that needs to be set as part of the FF_SYNC message.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode generates code to copy the specified destination index
into subregister 5 of the MRF message header.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode will be used when sending SVB WRITE messages to save
transform feedback outputs into Streamed Vertex Buffers.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
So far in gen6 we only used geometry shaders to implement transform feedback
in vertex shaders, so we assumed that the VUE map for the geometry shader
stage was always the same as for the vertex shader stage. This is no longer
true now that we support user provided geometry shaders in gen6 too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For this we will need to move PrimitiveID information, delivered in the thread
payload in r0.1, to a separate register (we use GS_OPCODE_SET_PRIMITIVE_ID
for this), then map the corresponding varying slot to that register in the
setup_payload() method.
Notice that we cannot use a virtual register as the destination for the
PrimitiveID because we need to map all input attributes to hardware registers
in setup_payload(), which happens before virtual registers are mapped to
hardware registers. We could work around that issue if we were able to compute
the first non-payload register in emit_prolog() and move the PrimitiveID
information to that register, but we can't because at that point we still
don't know the final number uniforms that will be included in the payload.
So, what we do is to place PrimitiveID information in r1, which is always
delivered as part of the payload but its only populated with data
relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE
in the 3DSTATE_GS state packet.
When we implement transform feedback, we wil make sure to move the value of r1
to another register before we overwrite it with the PrimitiveID.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 the geometry shader payload includes the PrimitiveID information in
r0.1. When the shader code uses glPimitiveIdIn we will have to move this to
a separate hardware register where we can map this attribute. This opcode
takes the selected destination register and moves r0.1 there.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 we need to end the thread differently depending on whether we have
emitted at least one vertex or not. In case we did, the EOT message must
always include the COMPLETE flag or else the GPU hangs. If we have not
produced any output, however, we can't use the COMPLETE flag.
This would lead us to end the program with an ENDIF opcode, which we want
to avoid (and actually is not permitted since it hits an assertion), so
instead what we do is that we always request a new VUE handle every time we do
an URB WRITE, even for the last vertex we emit. With this we make sure that
whether we have emitted at least one vertex or none at all we have to finish the
thread without writing to the URB, which works for both cases by setting the
COMPLETE and UNUSED flags in the EOT message.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just in case the GS algorithm does not call EndPrimitive() for the last
primitive produced. This is relevant only for non point outputs, since for
this we are already setting the PrimEnd flag on each vertex we emit.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Geometry shaders in gen6 are significantly different from gen7+ so it is better
to have them implemented in a different file rather than adding gen6 branching
paths all over brw_vec4_gs_visitor.cpp.
This commit adds an initial implementation that only handles point output, which
is the simplest case.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen7+ we emit vertices as they come, however in gen6 geometry shaders we
have to buffer vertex data for all vertices and then emit it all in one go
at the end. To achieve this we need to generalize emit_urb_slot() to store
vertex data in general purpose registers and not only MRF registers.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We had GS_OPCODE_SET_DWORD_2_IMMED but this required its source argument to be
an immediate. In gen6 we need to set dword 2 of the URB write message header
from values stored in separate register, so we need something more flexible.
This change replaces GS_OPCODE_SET_DWORD_2_IMMED with GS_OPCODE_SET_DWORD_2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 seems to require that EOT messages include the complete flag too or else
the GPU hangs. We add will this flag to the instruction when we emit the
thread end opcode.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 geometry shaders need to allocate URB handles for each new vertex they
emit after the first (the URB handle for the first vertex is obtained via the
FF_SYNC message).
This opcode adds the URB allocation mechanism to regular URB writes.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This implements the FF_SYNC message required in gen6 geometry shaders to
get the initial URB handle.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is needed to support user-provided geometry shaders, since the
brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback
for vertex shaders.
If there is no user-provided geometry shader the implementation falls back to
the original code.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, gen6 only uses geometry shaders for transform feedback so the state
we emit is not suitable to accomodate general purpose, user-provided geometry
shaders. This patch paves the way to add these support and the needed
3DSTATE_GS packet modifications for it.
Previous code that emitted state to implement transform feedback in gen6 goes
to upload_gs_state_adhoc_tf().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, when a geometry shader can't use dual object mode we fall back to
dual instance mode, however, when invocations == 1, single dispatch mode is
more performant and equally efficient in terms of register pressure.
Single dispatch mode requires that the driver can handle interleaving of
input registers, but this is already supported (dual instance mode has
the same requirement). However, to take full advantage of single dispatch mode
to reduce register pressure we would also need the ability to store two
separate vec4 output values into vec8 registers, which would approximately
double our capacity to store temporary values, but currently the vec4 visitor
and generator classes do not support this, so at the moment register pressure
in single and dual instance modes is the same.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It has been a bad name since we added the builder. Rename it to
ILO_DEBUG=batch to match i965, and call ilo_builder_decode() from
ilo_cp_submit_internal().
"Flush" is used for too many things already: pipe resource flush, pipe context
flush, pipe transfer region flush, and hardware pipeline flush. Rename it to
ilo_cp_submit(). As such, ILO_DEBUG=flush is renamed to ILO_DEBUG=submit.
Fredrik's implementation of ARB_vertex_attrib_binding introduced new
gl_vertex_attrib_array and gl_vertex_buffer_binding structures, and
converted Mesa's older gl_client_array to be derived state. Ultimately,
we'd like to drop gl_client_array and use those structures directly.
One hitch is that gl_client_array::_MaxElement doesn't correspond to
either structure (unlike every other field), so we'd have to figure out
where to store it. The _MaxElement computation uses values from both
structures, so it doesn't really belong in either place. We could put
it in the VAO, but we'd have to pass it around everywhere.
It turns out that it's only used when ctx->Const.CheckArrayBounds is
set, which is only set by the (rarely used) classic swrast driver.
It appears that drivers/x11 used to set it as well, which was intended
to avoid segmentation faults on out-of-bounds memory access in the X
server (probably for indirect GLX clients). However, ajax deleted that
code in 2010 (commit 1ccef926be).
The bounds checking apparently doesn't actually work, either. Non-VBO
attributes arbitrarily set _MaxElement to 2 * 1000 * 1000 * 1000.
vbo_save_draw and vbo_exec_draw remark /* ??? */ when setting it, and
the i965 code contains a comment noting that _MaxElement is often bogus.
Given that the code is complex, rarely used, and dubiously functional,
it doesn't seem worth maintaining going forward. This patch drops it.
This will probably mean the classic swrast driver may begin crashing on
out of bounds vertex buffer access in some cases, but I believe that is
allowed by OpenGL (and probably happened for non-VBO accesses anyway).
There do not appear to be any Piglit regressions, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Roland Scheidegger <sroland@vmware.com>
While depth test state is passed through the fragment shader as sideband,
data, the stencil test state has to be set by the fragment shader itself.
Many tests are still failing, but this gets most of hiz/ passing.
The SWZ instruction can have swizzle terms >4 (SWIZZLE_ZERO, SWIZZLE_ONE).
These swizzle terms caused a few assertions to fail.
This started happening after the commit "mesa: Actually use the Mesa IR
optimizer for ARB programs." when replaying some apitrace files.
A new piglit test (tests/asmparsertest/shaders/ARBfp1.0/swz-08.txt)
exercises this.
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows for introducing dead code eliminating of uniforms, copy
propagation of uniforms, and instruction rescheduling between instructions
that both read uniforms.
This is particularly important for outputs, where we try to MOV the whole
vec4 to the VPM, even if only 1-3 components had been set up. It might
also be important for temporaries, if the shader reads components before
writing them.
While the result of signed integer division by zero is undefined by glsl
(and doesn't exist with d3d10), we must not crash, so need to make sure we
don't get sigfpe much like udiv already does.
Unlike udiv where we return 0xffffffff (as required by d3d10) there is
no requirement right now to return anything specific so we use zero.
sample opcodes don't have valid texture target information (and I don't think
this should be changed), however it would be nice if we had that information
ready elsewhere, so stuff that information into the tgsi info when analyzing
a shader.
v2: Ilja Mirkin spotted some bugs wrt not handling msaa resources. So add them
and while there also add them to the tex opcode analysis this was cloned from
as well (plus get rid of some bug not detecting indirect textures there in some
cases too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
MESA_FORMAT_x8y8z8w8 puts the x channel in the least significant part of
the containing 32-bit integer, which is equivalent to PIPE_FORMAT_xyzw8888.
PIPE_FORMAT_x8y8z8w8 puts the x channel first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_LnAn puts the luminance in the least significant part of
the containing integer, which is equivalent to PIPE_FORMAT_LAnn.
PIPE_FORMAT_LnAn puts the luminance first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each 8888 SRGB format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
v2: fix missing i965 additions. (Jason)
fix 127->255 max alpha for SRGB formats. (Jason)
v1: Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The associated UNORM format already existed.
This means that each LnAn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
[airlied: rebased onto current master]
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the first listed component is in the least
significant byte of the integer. The corresponding UNORM aliases already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each RnGnBnxn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
The associated UNORM and SRGB formats already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the alpha or green channel is first in memory.
This means that each LnAn and RnGn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason pointed out the bug on review adding new formats,
but the existing format also appears to have the bug, so
use 255 as the max, these are SRGB no SNORM.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Luminance is the least-significant byte of the uint16, rather than the
lowest byte in memory. Other parts of mesa already handle this correctly
for big-endian, and swrast already handles other MESA_FORMAT_x8y8 formats
correctly. This case was just an odd-one-out.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_R8G8B8X8_SNORM used a function called unpack_X8B8G8R8_SNORM
while MESA_FORMAT_R8G8B8X8_SRGB used a function called unpack_R8G8B8X8_SRGB.
This patch renames the SNORM function to have the same order as the
MESA_FORMAT name, like the SRGB function does.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was being shared using a ../../ get out of gallium into
mesa, and I swore when I did it I'd fix things when we got a util
dir, we did, so I have.
v2: move RGTC_DEBUG define
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This gets a ton of piglit working that crashes in waffle context
management stuff otherwise. Actually supporting mismatched FB sizes is at
best going to require some more load/store generals for color buffers, but
if I can't manage to do that I'll want to just have state_tracker reject
those FBOs as unsupported, rather than deny GL 2.1.
Previously, we would get a trailing ', ' which looked strange.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal here is to have an argument for the depth write opcode so that I
can do computed depth. In the process, this makes the calculations that
will be emitted more obvious in the QIR.
Commit afe3d1556f (i965: Stop doing
remapping of "special" regs.) stopped remapping delta_x/delta_y, and
additionally stopped considering them always-live. We later realized
delta_x was used in register allocaiton, so we actually needed to remap
it, which was fixed in commit 23d782067a
(i965/fs: Keep track of the register that hold delta_x/delta_y.).
However, that commit didn't restore the "always consider it live" part.
If all the code using delta_x was eliminated, fs_visitor::delta_x would
be left pointing at its old register number. Later code in register
allocation would handle that register number specially...even though it
wasn't actually delta_x.
To combat this, set delta_x/y to BAD_FILE if they're eliminated, and
check for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
The triangle_32_ rast functions never made it into the debug output,
confused me for a few seconds.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch builds on 6c8f547f66 and
previous patches by allowing u_format.csv to specify separate big-endian
and little-endian layouts. It then uses this to specify the correct layouts
for various depth/stencil formats. Later patches handle other formats.
To recap, the idea is that u_format.csv lists the channels for an N-byte
value as though it were an N-byte integer. For little-endian targets
the channels are listed starting at the least-significant bit of the
integer while for big-endian targets the channels are listed starting
at the most-significant bit. This means that for something like
PIPE_FORMAT_B8G8R8A8_UNORM (blue in first byte of memory, alpha in last
byte of memory) the orders are the same for both endiannesses. But for
something like PIPE_FORMAT_S8_UINT_Z24_UNORM, where the stencil is in
the least significant byte of a 32-bit integer, there need to be separate
channel definitions for each endianness.
The effect of this patch is to make the affected PIPE_FORMAT_*s have
the same layout as the associated MESA_FORMAT_*s for big-endian.
The MESA_FORMAT_*s are already handled correctly.
Fixes various piglit tests on z. No regressions on x86_64.
[airlied: squash subsequent patches]
util: Add big-endian layout for 5551 and 565 formats
util: Add big-endian layout for 10/10/10/2 formats
util: Add big-endian layout for 4444 formats
util: Add big-endian layout for 233 format
util: Add big-endian layout for 44 formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
llvmpipe treats PIPE_FORMAT_Z32_FLOAT_S8X24_UINT as a bit of a special case,
handling it as two 32-bit pieces rather than a single 64-bit block:
/* 64bit d/s format is special already extracted 32 bits */
total_bits = format_desc->block.bits > 32 ? 32 : format_desc->block.bits;
The format_desc describes the whole 64-bit block, so the z shift
will be 32 for big-endian. But since we're accessing the z channel
as a 32-bit value rather than a 64-bit value, we need to mask the shift
with 31.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a very limited version, in particular sampler and sampler view
index must be the same. It cannot handle any modifiers neither.
Works much the same as soa version otherwise, to figure out the target we
need to store the sampler view dcls.
While here, also handle (no-op) RET and get rid of a couple bogus deprecated
comments.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes are a little oddly represented in the opcode_info, since
they don't count as texture instructions - they don't have valid target
information, but they may have offsets (unlike "ordinary" texture
instructions, the texture token may be optional for them).
So just make sure with these opcodes the optional offsets are accepted.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes don't encode a texture target, it would thus always
print UNKNOWN, which is not helpful (and wouldn't parse when giving
back the shader text to tgsi).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't have any specific limits in the hardware, just like the other
GPUs, so match their behavior. Fixes minmax_gles2 and several other
piglit tests relying on the specced uniform minmax values.
Put macro code in do {} while loop and put semicolons on macro calls
so auto indentation works properly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reduces gcc -O3 compile time to 1/4 of what it was on my system.
Reduces MSVC release build time too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
With GLES we don't give any kind of warning in case we don't
write to gl_position. This patch makes changes so that we
generate a warning in case of GLES (VER < 300) and an error
in case of GL.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Remap table for uniforms may contain empty entries when using explicit
uniform locations. If no active/inactive variable exists with given
location, remap table contains NULL.
v2: move remap table bounds check before existence check (Ian Romanick)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83574
We might run into ve->count == 0 and last_velement_edgeflag == true in
gen6_3DSTATE_VERTEX_ELEMENTS() when the state tracker sets an invalid
combination of VS and VE (does not seem to happen with st/mesa). Do not
assume ve->count is positive when last_velement_edgeflag is true.
Reported by Coverity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Rename ilo_builder_batch_state_base_address() to gen6_state_base_address() for
consistency and remove unused gen6_STATE_BASE_ADDRESS(). Reorder the code in
gen6_PIPE_CONTROL() a bit. Finally, some mostly cosmetic changes.
Move functions for the 3D pipeline to the new headers. We artificially split
the functions into top (vertex processing) and bottom (pixel processing), to
keep the headers at reasonable sizes.
ir_rvalue::constant_expression_value() recursively walks down an IR
tree, attempting to reduce it to a single constant value. This is
useful when you want to know whether a variable has a constant
expression value at all, and if so, what it is.
The constant folding optimization pass attempts to replace rvalues with
their constant expression value from the bottom up. That way, we can
optimize subexpressions, and ideally stop as soon as we find a
non-constant subexpression.
In order to obtain the actual value of an expression, the optimization
pass calls constant_expression_value(). But it should only do so if it
knows the value can be combined into a constant. Otherwise, at each
step of walking back up the tree, it will walk down the tree again, only
to discover what it already knew: it isn't constant.
We properly avoided this call for ir_expression nodes, but not for
ir_swizzle nodes. This patch fixes that, drastically reducing compile
times on certain shaders where tree grafting has given us huge
expression trees. It also fixes SuperTuxKart.
Thanks to Iago and Mike for help in tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The FS backend has always used 0, and the VS backend has always used 1.
I think 1 is just working around other problems, and is incorrect.
Samplers are baked in; nothing uses the UNIFORM register we would
create, and we shouldn't upload any constant values for them.
Fixes ES3-CTS.shaders.struct.uniform.sampler_array_vertex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Samplers take up zero slots and therefore don't exist in the params
array, nor are they included in stage_prog_data->nr_params. There's no
need to store their size in param_size, as it's only used for dealing
with arrays of "real" uniforms (ones uploaded as shader constants).
We run into all kinds of problems trying to refer to the uniform storage
for variables that don't have uniform storage. For one, we may use some
other variable's index, or access out of bounds in arrays. In the FS
backend, our extra 2 * MaxSamplerImageUnits params for texture rectangle
rescaling paper over a lot of problems. In the VS backend, we claim
samplers take up a slot, which also papers over problems.
Instead, just skip allocating storage for variables that don't have any.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
We always uploaded them together, mostly out of laziness - both required
an additional vertex element. However, gl_VertexID now also requires an
additional vertex buffer for storing gl_BaseVertex; for non-indirect
draws this also means uploading (a small amount of) data. This is extra
overhead we don't need if the shader only uses gl_InstanceID.
In particular, our clear shaders currently use gl_InstanceID for doing
layered clears, but don't need gl_VertexID.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
In the non-indirect draw case, we call intel_upload_data to upload
gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload
buffer, and increments the upload BO reference count.
So, we need to unreference it when making brw->draw.draw_params_bo point
at something else, or else we'll retain a reference to stale upload
buffers and hold on to them forever.
This also means that the indirect case should increment the reference
count on the indirect draw buffer when making brw->draw.draw_params_bo
point at it. That way, both paths increment the reference count, so
we can safely unreference it every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
4f338c9b introduced logic to trigger a flush rather than overflowing
cmdstream buffer. But the threshold was too low, triggering flushes
where they were not needed. This caused problems with games like
xonotic.
Part of the problem is that we need to mark all state dirty between
cmdstream submit ioctls, because we cannot rely on state being
preserved across ioctls. But even with that, there are still some
problems that are still being debugged. For now:
1) correctly mark all state dirty
2) introduce FD_MESA_DEBUG flush flag to force rendering to be flushed
between each draw, to trigger problems (so that I can debug)
3) use a more reasonable threshold so for normal usecases we don't
trigger the problems
This at least corrects the regression, but there is still more debugging
to do.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need the .w component to end up in .x, since the hw appears to fetch
gl_FragColor starting with the .x coordinate regardless of MRT format.
As long as we are doing this, we might as well throw out the remaining
unneeded components.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Because of render-to-alpha (000x) shenanigans, freedreno needs to do
some special handling when rendering to alpha-only formats. And I
noticed that while we had _is_luminance(), _is_intensity(), etc, an
_is_alpha() helper was missing. So fix that.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Unreference the ctx->_Shader object before we delete all the pipeline
objects in the hash table. Before, ctx->_Shader could point to freed
memory when _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL)
was called.
Fixes crash when exiting the piglit rendezvous_by_location test on
Windows.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q_total should never go below 0 (which is why it's defined as unsigned),
and if it does, then something is seriously wrong.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As noted in the previous commit, this was introduced in
567e2769b8 ("ra: make the p, q test more
efficient"), but I forgot to mention it.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 567e2769b8 ("ra: make the p, q
test more efficient") I unknowingly introduced a new requirement to the
register allocator API: the user must set the register class of all
nodes before setting up their interferences, because
ra_add_conflict_list() now uses the classes of the two interfering
nodes. i965 already did this, but r300g was setting up register classes
interleaved with setting up the interference graph. This led to us
calculating the wrong q total, and in certain cases
e78a01d5e6 (" ra: optimistically color
only one node at a time") made it so that this bug caused a segfault. In
particular, the error occurred if the q total was decremented to 1 below
0 for the last node to be pushed onto the stack. Since q_total is an
unsigned integer, it overflowed to 0xffffffff, which is what
lowest_q_total happens to be initialzed to. This means that we would
fail the "new_q_total < lowest_q_total" check on line 476 of
register_allocate.c, and so the node would never be pushed onto the
stack, which led to segfaults in ra_select() when we failed to ever give
it a register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82828
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Pavel Ondračka <pavel.ondracka@email.cz>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Needed for assert.
Fixes build on BE archs with -Werror=implicit-function-declaration.
In file included from
../../../../../src/gallium/auxiliary/draw/draw_fs.c:30:0:
../../../../../src/gallium/auxiliary/util/u_math.h: In function
'util_memcpy_cpu_to_le32':
../../../../../src/gallium/auxiliary/util/u_math.h:810:4: error:
implicit declaration of function 'assert'
[-Werror=implicit-function-declaration]
assert(n % 4 == 0);
^
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In buf_clear_region() and buf_copy_region(), max_cmd_size was set to 0. If
either of the functions is called and there is not enough space in the
builder, the next ilo_cp_flush() will fail silently in a release build.
Replace magic numbers by size defines in tex_clear_region()/tex_copy_region()
for consistency and readability.
Intruduce gen6_blt_bo and gen6_blt_xy_bo to describe BOs. In the extreme case
of gen6_XY_SRC_COPY_BLT(), the number of parameters goes down from 18 to 8.
This allows a sampler view to have a different texture target than the
underlying resource. This will be used to implement the type casting
between 2d arrays and cube maps as specified in ARB_texture_view.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a regression from "nouveau/vdec: small fixes to h264 handling"
New picking order for frames:
1. Vidbuf pointer matches.
2. Take the first kicked ref.
3. If that fails, take a ref that has a different last_used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
The BSP bo might be too small to contain all of the bsp data,
bump its size on overflow. Also bump inter_bo when this happens,
it might be too small otherwise.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
If the source is not a GRF, it could have a register >= virtual_grf_count.
Accessing virtual_grf_end with such a register would lead to
out-of-bounds access. Make sure the source is a GRF before accessing
virtual_grf_end.
Fixes Valgrind complaints while compiling some shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
If the glx/wgl state tracker requested a core profile but the gallium
driver did not support some feature of GL 3.1 or later, we were setting
ctx->Version=0 and then failing the assertion in
_mesa_initialize_exec_table().
With this change we check for ctx->Version=0 and tear down the context
and return NULL from st_create_context().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage.
From ARB_transform_feedback3:
"When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the Discarding Rasterization stage (see Section 3.x)
right before rasterization. This counter is incremented whether or not
transform feedback is active."
Unfortunately, we don't have any registers that provide the number of primitives
written to a specific stream other than the ones that track the number of
primitives written to transform feedback in the SOL stage, so we can't
implement this exactly as specified.
In the past we implemented this feature by activating the SOL unit even if
transform feeback was disabled, but making it so that all buffers were
disabled and it only recorded statistics, which gave us the right semantics
(see 3178d2474a). Unfortunately, this came with
a significant performance impact and had to be reverted.
This new take does not intend to implement the exact semantics required by
the spec, but improves what we have now, since now we return the primitive
count for stream 0 in all cases. With this patch we use
GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
for non-zero streams. This would return the number of primitives written
to transform feedback for each stream instead. Since non-zero streams are
only useful in combination with transform feedback this should not be too
bad, and the only case that I think we would not be supporting would be
the one in which we want to use both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
detect buffer overflow.
This patch also fixes the following piglit test:
arb_gpu_shader5-xfb-streams-without-invocations
This test uses both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
does never hit the overflow case, so both queries are always expected to return
the same value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
That better matches the actual userspace use case, the
kernel will force it to VRAM if the hardware requires it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Less CPU overhead and avoids contention over CPU accessible memory on startup.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
In preparation to using buffers clears with the hw engine(s).
v2: split out flipping to using hw buffer clears.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This builds the opengl DLLs with address layout space randomization
(ASLR) and data execution prevention (DEP) for better security.
Reviewed-by: Kurt Daverman <krd@vmware.com>
The old disassembler was modified from i965's. It is as much work as doing a
new one to keep it up-to-date, which also requires copying more headers over.
The outputs of this new disassembler should match i965's as closely as
possible.
Add some new registers and some tweaks. The changes that affect ilo are
GEN6_REG_HS_INVOCATION_COUNT -> GEN7_REG_HS_INVOCATION_COUNT
GEN6_REG_DS_INVOCATION_COUNT -> GEN7_REG_DS_INVOCATION_COUNT
GEN6_COND_NORMAL -> GEN6_COND_NONE
If a precision qualifer is allowed on type T, it should be allowed
on an array of T. Refactor the check to ensure this is the case.
(Fixes failures in WebGL conformance test 'gl-min-textures')
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
MSVC replaces the "F" in "255.0F" with the macro argument which leads
to an error. s/F/FLT/ to avoid that.
It turns out we weren't using this macro at all on MSVC until the
recent "mesa: Drop USE_IEEE define." change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes a build error on DragonFly.
CC libpipe_loader_la-pipe_loader_drm.lo
pipe_loader_drm.c: In function 'pipe_loader_drm_probe':
pipe_loader_drm.c:207:10: error: implicit declaration of function 'close' [-Werror=implicit-function-declaration]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Apparently guardband clipping doesn't work like we thought: objects
entirely outside fthe guardband are trivially rejected, regardless of
their relation to the viewport. Normally, the guardband is larger than
the viewport, so this is not a problem. However, when the viewport is
larger than the guardband, this means that we would discard primitives
which were wholly outside of the guardband, but still visible.
We always program the guardband to 8K x 8K to enforce the restriction
that the screenspace bounding box of a single triangle must be no more
than 8K x 8K. So, if the viewport is larger than that, we need to
disable guardband clipping.
Fixes ES3 conformance tests:
- framebuffer_blit_functionality_negative_height_blit
- framebuffer_blit_functionality_negative_width_blit
- framebuffer_blit_functionality_negative_dimensions_blit
- framebuffer_blit_functionality_magnifying_blit
- framebuffer_blit_functionality_multisampled_to_singlesampled_blit
v2: Mention the acronym expansion for TA/TR/MC in the comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have the data available, we need to expose it to the
shaders. We can reuse the same vertex element that we use for
gl_VertexID, but we need to back it by an actual vertex buffer.
A hardware restriction requires that vertex attributes coming from a
buffer (STORE_SRC) must come before any other types (i.e. STORE_0).
So, we have to make gl_BaseVertex be the .x component of the vertex
attribute. This means moving gl_VertexID to a different component.
I chose to move gl_VertexID and gl_InstanceID to the .z and .w
components, respectively, to make room for gl_BaseInstance in the .y
component (which would also come from a buffer, and therefore be
STORE_SRC).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to emit another VERTEX_BUFFER_STATE for gl_BaseVertex;
pulling this into a helper function will save us from having to deal
with cross-generation differences in that code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used for GL_ARB_shader_draw_parameters, as well as fixing
gl_VertexID, which is supposed to include gl_BaseVertex's value.
For indirect draws, we simply point at the indirect buffer; for normal
draws, we upload the value via the upload buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The lower_vertex_id pass converts uses of the gl_VertexID system value
to the gl_BaseVertex and gl_VertexIDMESA system values. Since
gl_VertexID is no longer accessed, it would not be considered active.
Of course, it should be, since the shader uses gl_VertexID.
v2: Move the var->name dereference past the var != NULL check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Converts gl_VertexID to (gl_VertexIDMESA + gl_BaseVertex). gl_VertexIDMESA
is backed by SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, and gl_BaseVertex is backed
by SYSTEM_VALUE_BASE_VERTEX.
v2: Put the enum in struct gl_constants and propoerly resolve the scope
in C++ code. Fix suggested by Marek.
v3: Reabase on Matt's foreach_in_list changes (was using foreach_list).
v4 (Ken): Use a systemvalue instead of a uniform because
STATE_BASE_VERTEX has been removed.
v5: Use a boolean to select lowering, and only allow one lowering
method. Suggested by Ken.
v6 (Ken): Replace strcmp against literal "gl_BaseVertex"/"gl_VertexID"
with SYSTEM_VALUE enum checks, for efficiency.
v7: Rebase on context constant initialization work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
There exists hardware, such as i965, that does not implement the OpenGL
semantic for gl_VertexID. Instead, that hardware does not include the
value of basevertex in the gl_VertexID value.
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE is the system value that represents
this semantic.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: Additions to the documentation for SYSTEM_VALUE_VERTEX_ID. Quote
the GL_ARB_shader_draw_parameters spec and mention DirectX SV_VertexID.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
181581280b changed the way the
llvm-config version is read from sed to grep and introduced
a requirement for gnu grep extension that treats BREs as EREs.
Avoid this by calling egrep instead of grep which should be
able to handle EREs everywhere.
This allows Mesa to build on OpenBSD again.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
This doesn't quite make depth-tex-compare work, presumably because we're
not hitting equality with itof(sample) * 1.0/0xffffff in the 0xffffff
case. arb_fragment_program_shadow tests pass, though, as well as a bunch
of other shadow-related stuff.
We potentially need to be careful that use of a value stored in r4 isn't
copy-propagated (or something) across another r4 write. That doesn't
appear to happen currently, and this makes the dataflow more obvious. It
also opens up not unpacking the r4 value, which will be useful for depth
textures.
Since software primitive-restart emulation is going to be removed (and
anyways, mostly seemed to be crash prone in combination with
u_primconvert and oddball scenarios (like PIPE_PRIM_POLYGON with only a
single vertex), might as well do it in hardware (which fortunately
didn't turn out to be too hard to figure out).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Triggered by shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0..2], LOCAL
0: IF CONST[0].xxxx :0
1: MOV TEMP[0], TEMP[1]
2: ELSE :0
3: MOV TEMP[0], TEMP[2]
4: ENDIF
5: MOV OUT[0], TEMP[0]
6: END
not really a sane shader, although driver segfaulting is probably
not the appropriate response.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We currently aren't too clever about dealing with running out of
cmdstream buffer space. Since we use a single buffer for both drawing
and tiling commands, we need to ensure there is enough space at the tail
of the cmdstream buffer to fit the tiling commands.
Until we get more clever, the easy solution is a threshold to trigger
flushing rendering even if the application does not trigger flush (swap,
changing render target, etc). This way we at least don't crash for apps
that do several thousand draw calls (like some piglit tests do).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Most of the things the new compiler still has trouble with basically
amount to cp stage removing too many copies. But without the cp stage,
the shaders the new compiler produces are still better (perf and
correctness) than the old compiler. So a simple thing to do until I
have more time to work on it is first trying falling back to new
compiler without cp, before finally falling back to old compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Excess conversions considered harmful.
Recently Matt reworked the boolean uniform handling to use the value of
UniformBooleanTrue, rather than integer 1, when uploading uniforms:
mesa: Upload boolean uniforms using UniformBooleanTrue.
glsl: Use UniformBooleanTrue value for uniform initializers.
Marek then set the default to 1.0f for drivers without native integer
support:
mesa: set UniformBooleanTrue = 1.0f by default
However, ir_to_mesa was assuming a value of integer 1, and arranging for
it to be converted to 1.0f on upload. Since Marek's commit, we were
uploading 1.0f = 0x3f800000 which was being interpreted as the integer
value 1065353216 and converted to float as 1.06535322E9, which broke
assumptions in ir_to_mesa that "true" was exactly 1.0f.
+13 Piglits on classic swrast (fs-bool-less-compare-true,
{vs,fs}-op-not-bool-using-if, glsl-1.20/execution/uniform-initializer).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa already defines _GNU_SOURCE for glibc based systems and defining
_GNU_SOURCE will break the Mesa build on other systems such as OpenBSD.
_GNU_SOURCE only seems to be included in llvm-config output when
LLVM is built via autoconf and not when it is built by cmake.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Superseded by HAVE_LOADER_GALLIUM. The latter has a *DRM* brethren
making the whose easier on which one to keep.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Just drop the conditional and simplify our build. This means that
it'll build every time, but it does not require any dependencies nor
does it take that long to compile 200 lines of boilerplate code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The variable was unused and gave false information. The need for nonnull
winsys currently does not relate as it used to. Nowadays one can mix and
match more freely with plenty of winsys' to make your head spin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With recent commit we removed the NEED_NONNULL_WINSYS checks when
selecting the hardware (inc svga) winsys. svga has only one winsys
that explicitly requires libdrm (via it's bundled version of
vmwgfx_drm.h) but configure.ac never really checks for it.
Add the check early to prevent people from shooting themselves when
they select the driver but lack libdrm.
$ ./autogen.sh --disable-dri --disable-egl --disable-gallium-llvm
--with-dri-drivers=swrast --with-gallium-drivers=svga,swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82539
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The rest of stencil handling isn't done yet, but it documents an extra
cl_u8(0) and helps make it obvious why we don't need to format clear_depth
the same way the depth/stencil buffer is formatted.
For now it still requires the color buffer to be present -- we're relying
on the store of color buffer contents to end the frame, and we have to do
something with color buffers in the rendering config packet.
According to GLSL-ES Spec(i.e. 1.0, 3.0), gl_Position value is undefined
after the vertex processing stage if we don't write gl_Position. However,
GLSL 1.10 Spec mentions that writing to gl_Position is mandatory. In case
of GLSL-ES, it's not an error and atleast the linking should pass.
Currently, Mesa throws an linker error in case we dont write to gl_position
and Version is less then 140(GLSL) and 300(GLSL-ES). This patch changes
it so that we don't report an error in case of GLSL-ES.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83380
Similar to the changes to GEN7 command functions, but to GEN6 this time.
As every GPE function has been converted, remove
ilo_cp_assert_no_implicit_flush() calls.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_state_write()
ilo_cp_steal_ptr() -> ilo_builder_state_pointer()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_surface_write()
ilo_cp_steal() and ilo_cp_write() -> ilo_builder_surface_write()
ilo_cp_write_bo() -> ilo_builder_surface_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and make sure there is no implicit flush. Use this chance to drop the
"_emit_" infix.
Remove instruction buffer management from ilo_3d and adapt ilo_shader_cache to
upload kernels to ilo_builder. To be able to do that, we also let ilo_builder
manage STATE_BASE_ADDRESS.
Comparing to how we manage batch and instruction buffers, the new builder
- does not flush
- manages both types of buffers
- manages STATE_BASE_ADDRESS
- uploads kernels using unsynchronized mapping
- has its own decoder for the buffers
- provides more helpers
Do not require a toy_compiler so that it can be used in other places, such as
state dumping. Add a bool to control whether the raw instruction words are
shown.
This would just crash. Noticed by accident while checking int divisions by zero
with a quickly hacked piglit test.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Forgot to add it when I fixed up the start instance handling in (llvm) draw.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Calling the variable min when it's really max and vice versa seems a bit
confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
UBO loads can be boolean-valued expressions, too, so we need to handle
them in emit_bool_to_cond_code() and emit_if_gen6().
However, unlike most expressions, it doesn't make sense to evaluate
their operands, then do something with the results. We just want to
evaluate the UBO load as a whole---which performs the read from
memory---then load the boolean result into the flag register.
Instead of adding code to handle it, we can simply bypass the
ir_expression handling, and fall through to the default code, which will
do exactly that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Matt and I believe that Sandybridge actually uses 0xFFFFFFFF for a
"true" comparison result, similar to Ivybridge. This matches the
internal documentation, and empirical results, but contradicts the PRM.
So, the comment is inaccurate, and we can actually just handle these
directly without ever needing to fall through to the condition code
path.
Also, the vec4 backend has always done it this way, and has apparently
been working fine. This patch makes the FS backend match the vec4
backend's behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added ir_triop_csel, and forgot again
when adding it to emit_bool_to_cond_code.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
As long as we don't have a workaround for frame based
decoding in VDPAU we should not advertise NV_vdpau_interop.
v2: fix commit message, check if get_video_param is present
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Instead we cast backend_visitor::prog for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes use of Altivec pack intrinsics on little-endian PowerPC
systems. Since little-endian operation only affects the load and store
instructions, the semantics of pack (and other) instructions that take
two input vectors implicitly change: the pack instructions still fill
a register placing values from the first operand into the "high" parts
of the register, and values from the second operand into the "low" parts
of the register, but since vector loads and stores perform an endian swap,
the high parts end up at high memory addresses.
To still achieve the desired effect, we have to swap the two inputs to
the pack instruction on little-endian systems. This is done automatically
by the back-end for instructions generated by LLVM, but needs to be done
manually when emitting intrisincs (which still result in that instruction
being emitted directly).
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Maarten Lankhorst <dev@mblankhorst.nl>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BitSet::allocate() is being used with the expectation that it would
leave the bitfield untouched if its size hasn't changed, however,
the function always zeroed the last word, which led to obscure bugs
with live set computation.
This also fixes BitSet::resize(), which was broken, but luckily not
being used.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This hasn't been enabled in a long time and is completely stale and
unnecessary. Remove, esp since it doesn't build.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uniform values are in the UNIFORM register file, not the GRF register file.
Looking in virtual_grf_sizes makes no sense and only makes the output of
dump_instructions confusing.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript, README and trace.xsl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
- include the headers' README & svga_dump.py
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & README
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the android buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & LLVM note
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & custom include
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & the tests
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript
v2: Don't double-include the compiler sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
v2: Don't double include the test sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Commit 60d772cd9d1(st/vega: add headers and SConscript in
the tarball) meant to pick all the headers to be included in
the release tarball yet it missed a few.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
A few files were missing, namely:
- a few of the common headers
- the android + gdi sources
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
With the last revisions of commit 664c2d76947(gallium/ilo: cleanup
intel_winsys.h) we moved the header from winsys to drivers, but we
forgot to update the makefile.sources to reflect this.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Currently, BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE* and
BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_RESOLVE* shaders
in setup_glsl_msaa_blit_shader() are not recompiled
when the source buffer sample count changes. For example,
implementation continued using a 4X msaa shader, even if
source buffer changes from 4X msaa to 8x msaa. It causes
incorrect rendering.
This patch adds new enums in blit_msaa_shader, one for
each supported sample count, and uses them to store
msaa shaders.
Fixes following piglit tests on Broadwell:
ext_framebuffer_multisample-accuracy all_samples color
ext_framebuffer_multisample-accuracy all_samples depth_draw
ext_framebuffer_multisample-accuracy all_samples depth_resolve
ext_framebuffer_multisample-accuracy all_samples stencil_draw
ext_framebuffer_multisample-accuracy all_samples stencil_resolve
ext_framebuffer_multisample-formats all_samples
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstarnd <jason@jlekstrand.net>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libXvMC*, libvdpau* and libomx_mesa depends unconditionally
upon xcb, due to their usage of the aux/vl gallium module.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libEGL depends conditionally (when building with x11 platform)
upon xcb.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libGL depends conditionally (when building with dri3) upon
xcb 1.9.3 and unconditionally on ancient xcb functions -
xcb_generate_id and xcb_request_check amongst others.
v2: Use PKG_CHECK_EXISTS() when checking for dri3 xcb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80848
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When a texture is wrapped in a texture view, we can't trust the format in
the miptree itself. This patch allows us to pass the format seperately
through blorp so we can proprerly handled wrapped textures.
It's worth noting here that we can use the miptree format directly for
depth/stencil formats because they cannot be reinterpreted by a texture
view.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
CC: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.
But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)
total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:
brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.
Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dump() methods don't alter the CFG or basic blocks, so we should
mark them as const. This lets you call them even if you have a const
cfg_t - which is the case in certain portions of the code (such as live
interval handling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As older versions of gnu ld did not support --dynamic-list check to see
if it is supported before using it. Non gnu linkers such the apple one
likely lack this option as well.
Fixes the build on OpenBSD which has binutils 2.15 and 2.17.
The --dynamic-list option seems to been have introduced sometime after
binutils 2.17 was released as it is present in 2.18.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes binary garbage in the compilation logs caused by
compat::string::c_str() not being null-terminated (which is a bug on
its own that will be fixed in another commit).
Reported-by: EdB <edb+mesa@sigluy.net>
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed91
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1e
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404da
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
We can't rely on the value from the assembler if relative addressing is
used. So instead use the max of declared-consts (which does not include
compiler immediates) and what we get from the assembler (which does).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
all_delayed will also be true if we didn't attempt to schedule anything
due to no more instructions using current addr/pred. We rely on coming
in to block_sched_undelayed() to detect and clean up when there are no
more uses of the current addr/pred, which isn't necessarily an error.
This fixes a regression introduced in b823abed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
This common init routine can be used by constructors for multiple program
types.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.
The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
A bool is 0 or ~0, and KILL_IF takes a float arg that's <0 for discard or
>= 0 for not. By negating it, we ended up doing a floating point subtract
of (0 - ~0), which ended up as an inf. To make this actually work, we
need to convert the bool to a float.
Reviewed-by: Brian Paul <brianp@vmware.com>
While similar in layout, the size of the SVGA3dSize type may be smaller than
the struct drm_vmw_size type that is part of the ioctl interface. The kernel
driver could accordingly overwrite a memory area following the size variable
on the stack. Typically that would be another local variable, causing
breakage in, for example, ubuntu 12.04.5 where the handle local variable
becomes overwritten.
v2: Fix whitespace errors
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Cc: "10.1 10.2 10.3" <mesa-stable@lists.freedesktop.org>
As explained in the previous commit, we want to avoid the possibility of
integer-multiplication overflow while allocating buffers.
In these two cases, the final allocation size is the product of three values:
one variable and two that are fixed constants at compile time.
In this commit, we move the explicit multiplication to involve only the
compile-time constants, preventing any overflow from that multiplication, (and
allowing calloc to catch any potential overflow from the remainining implicit
multiplication).
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit 32f2fd1c5d, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's been altering the tree and reporting "false" since January 2011.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, opt_copy_propagation_elements would always rewrite the
instruction stream, even if was the same thing as before. In order to
report progress correctly, we'll need to bail if the suggested
replacement is identical (or equivalent) to the original code.
This also introduced unnecessary noop swizzles, as far as I can tell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if chans < 4, we passed uninitialized stack garbage to the
ir_swizzle constructor for the excess components. Thankfully, it
ignores that data, as it's unnecessary, so no harm actually comes of it.
However, it's obviously better to initialize it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added it.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 46d03d37bf I renamed a Makefile target
from md5 to checksums, (as we switched from MD5 checksums to SHA-256
checksums, so the more general name is more future proof).
But that commit missed one mention of "md5" as a dependency of the .PHONY
target. Rename that here as well.
According to the GLSL 1.40 spec, section 5.7 Structure and Array Operations:
"Array elements are accessed using an expression whose type is int or uint."
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The default case was accidentally clearing RADEON_FLAG_CPU_ACCESS from the
previous fall-through cases.
Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Coverity pointed out we never dropped the lock here, so fix
it by using a common exit path.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tex-miplevel-selection was hammering my memory manager with primconverts
on individual quads. This gets all those converted IBs packed into larger
IBs.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is part of fixing extremely long runtimes on some piglit tests that
involve streaming vertex reuploads due to format conversions, and will
similarly be important for X performance, which relies on these flags.
A meta begin/end pair with MESA_META_DRAW_BUFFERS will change visible GL
state. We recreate the draw buffer enums from the buffer bitfield, which
changes GL_BACK to GL_BACK_LEFT (and GL_FRONT to GL_FRONT_LEFT).
This commit modifes the save/restore logic to instead copy the buffer enums
from the gl_framebuffer and then set them on restore using
_mesa_drawbuffers().
It's not clear how this breaks the benchmark in 82796, but fixing meta to not
leak the state change fixes the regression.
No piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=82796
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Coverity reported this, and I think this is the right solution,
since cache->items is struct cache_item ** not struct cache_item *,
we also realloc it using struct cache_item * at some point.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not all of these are used in every context, so this can make a
significant difference for short-lived contexts such as in piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we fails in reserve_explicit_locations, we leak uniform_map.
Reported-by: coverity scanner.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This one path doesn't goto fail, so it seems to leak dec.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.
This fixes retrieving chroma plane with VDPAU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.
Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.
This fixes the OSD in mplayer for VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.
Tested-by: Christian Ruppert <idl0r@qasl.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.
This fixes a hang in Brutal Legend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
It's almost the same.
This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).
2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
We were only using it to get at its type, which we already know because
it's a builtin variable.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only using it to get at its type, which we already know because
it's a builtin variable.
v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.
Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Since the fs backend can emit saturate as a separate instruction, there is
no need to detect for min/max instructions and to rewrite the instruction tree
accordingly. On the other hand, we don't need to emit a separate saturated
mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: - Syntax clarifications in inst->saturate assignment
- Remove extra parenthesis when assigning src_reg value
from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: Syntax clarifications in inst->saturate assignment (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
inner constant is 1.0.
- Fix comments to show that the optimization is a commutative operation
(Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
inner constant is < 1
- Fix comments to reflect we are doing a commutative operation (Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
- Add missing condition where the inner constant is 1.0 and outer constant is 0.0
- Make indexing of operands easier to read (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
unsigned int x_scaledown, y_scaledown;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some cases where the scheduler can get itself into impossible
situations, by scheduling the wrong write to pred or addr register
first. (Ie. it could end up being unable to schedule any instruction if
some instruction which depends on the current addr/reg value also
depends on another addr/reg value.)
To solve this we'd need to be able to insert extra mov instructions
(which would also help when register assignment gets into impossible
situations). To do that, we'd need to move the nop padding from sched
into legalize.
But to start with, just detect when we get into an impossible situation
and bail, rather than sitting forever in an infinite loop. This way it
will at least fall back to the old compiler, which might even work if
you are lucky.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All of the GL image enums fit in 16-bits.
Also move the fields from the anonymous "image" structucture to the next
higher structure. This will enable packing the bits with the other
bitfield.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0
Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0
A real savings of 346KiB on 32-bit and 262KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only values allowed are 0 and 1, and the value is checked before
assigning.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just use ir_variable::data.binding... because that's the where the
binding is stored for everything else that can use layout(binding=).
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VertexProgram and FragmentProgram have a Cache member for dealing
with fixed function programs. There are no fixed function geometry
programs, so this should never have existed, and was just copy and
pasted.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I actually screwed that up in 754319490f,
mistakenly thinking the code actually wanted the non-nan result before.
So, introduce that missing nan behavior case and use that instead.
For sse, there's no actual change in the resulting code at all, the fallback
code wouldn't have done the right thing though.
Of course, the actual issue I saw with pow() was completely unrelated...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty trivial, just fill in the offsets and such. The implementation
is near 100% copy and paste from llvmpipe. Should be useful for debugging.
No piglit change when not using SOFTPIPE_USE_LLVM=1.
Now that it can do the same tests with and without using llvm for vs/gs,
with llvm more pass, the only things failing only with llvm seems to be
edgeflags tests and vs/gs-pow-float-float (and for the latter I'm not
convinced the zero tolerance it requires is somehow mandated by glsl).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code is all in place now so enable it.
Seems to pass all relevant piglit tests (just like cube maps, some of the
cube map array tests need GALLIVM_DEBUG=no_quad_lod,no_rho_approx)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE
also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation
to the calculated face.
Also handle it for texture size query, looks like OpenGL wants the number
of cubes, not layers (so need division by 6).
No piglit regressions.
v2: fix up adding cube layer to face for seamless filtering (needs to happen
after calculating per-sample face). Undetected by piglit unfortunately.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Not sure why it was there but it is definitely not an error if gs outputs are
infs/nans. Besides, the outputs can be ints, in which case any small negative
number asserted.
This fixes piglit's texelFetch gs isamplerXX crashes with softpipe (down from
14 to 2).
Bug https://bugs.freedesktop.org/show_bug.cgi?id=80012
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
piglit tex-miplevel-selection nowadays doesn't use repeat wrap mode due to
sampler objects any longer, however at the time of the clear the wrap mode
is still illegal and at this point we get to verify the state, including
samplers (even though they won't get used), and because mesa doesn't treat
it as an incomplete texture as the spec says it should, we hit the assertion.
Just warn about this for now instead.
Gets crashes down from 44 to 14 in a piglit run (all were in various tests of
tex-miplevel-selection with texture rectangles). Though just about all
tex-miplevel-selection tests fail anyway for other reasons.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just handle as ordinary 2d / 2d array resources. Prevents an assertion failure
with softpipe and piglit glsl-resource-not-bound 2DMS/2DMSArray tests.
While here also fix TXD shadowCube similarly, which fixes the crash with piglit
tex-miplevel-selection textureGrad CubeShadow (the test will still fail due to
softpipe being broken).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=80011
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was meant for softpipe to not crash at some point if vertex texturing
was used. It is, however, fishy because it uses values from
draw_set_samplers/draw_set_sampler_views and not from the shader key. Albeit
we should still in all cases actually generate a new shader if this changes
(because the samplers and views themselves are in the key) I don't want to
think again wondering if that's really correct in the future.
Besides, at least today, it does not actually work for softpipe, as this was
relying on softpipe not actually calling draw_set_samplers/sampler_views at
all - I've verified it crashes regardless (if there were a tex instruction in
the vs, which normally should not happen anyway). For drivers which do indeed
not call these functions because they don't support vertex texturing at all
(r300), this should still not crash because the static texture data is all
zero, which causes the sampling functions to take an early out (same as is done
if no texture is bound at the slot used for sampling - verified with hacked up
softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
mesa was creating a cube map array texture with just one layer, which is
not legal. This caused an assertion failure when using that texture later
in llvmpipe (when enabling cube map arrays) since it verifies the number
of layers in the view is divisible by 6 (the sampling code might well crash
randomly otherwise) with piglit glsl-resource-not-bound CubeArray -fbo -auto.
v2: use appropriately sized texel array...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The source can be a register as well as an immediate, and disassembling
a register as an immediate can have some strange results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is meant to be private within the actual winsys. Remove it from
the exported header, and fold it into it's only user.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This define is always set and it had no real purpose according to
git log. Seems like it is a leftover from the vl/vdpau prototype
stage.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Namely
- the private header (xa_priv.h)
- README and
- xa-indent
Sort the sources list while we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 6a19bb56e0.
The above commit disabled the default build of xvmc as the xvmc tests
were failing. As pointed out by Ilia, the tests are "broken by design"
as they do not test the object that is build but the one that is
installed and setup on the workstation.
With previous commit we moved the programs from the 'make check' to
noinst automake target. This way they won't be run but will be around
for people to use them.
Cc: Tom Stellard <thomas.stellard@amd.com>
All the tests require an installed and setup XvMC, thus they
are not good candidates for 'make check'.
Keep them around as the user might want to actually test the
implementation post installation/setup.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the final remaining files into the tarball (make dist), namely:
- SConscripts
- Non-autotooled winsys' - android, gdi and hgl.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Include the headers within.
- Update scons to use them.
- Drop useless include (gallium/drivers) from scons.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Drop duplicate include compiler directives.
- Leave the sw/ prefix for all the software winsys headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add top_srcdir/src/gallium/winsys to GALLIUM_DRIVER_C{XXFLAGS}.
- Remove top_srcdir/src/gallium/drivers/radeon from the includes.
As a result:
- Common radeon headers are prefixed with 'radeon/'
- Winsys header inclusion is prefixed 'radeon/drm'
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Make the header location, inclusion and contents more common with
its i915,r* and nouveau counterparts:
- Move the header within drivers/ilo.
- Separate out intel_winsys_create_for_fd into 'drm_public' header.
- Cleanup the compiler includes.
v2: Move the header to drivers/ilo. Suggested by Chia-I.
v3: Correct intel_winsys.h inclusion. Spotted by Chia-I.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
V2: moved test for the VertexAttrib*Pointer() functions
to update_array(), and made constant available for drivers to set
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The base instance needs to be passed to the jited function, otherwise the
instanced data fetch will only work with the same start instance when the
jit function was created (and baking that into the key instead is not a viable
option).
This fixes piglit arb_base_instance-drawarrays (modulo some unrelated
core/compat context trouble I get for the test).
And fix the pipe cap bit in llvmpipe for it now that it actually works (it
already worked for softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The docs were never really up to date for them, missing just about everything.
So mark them off as all done for GL 3.3 (though softpipe is in fact quite
broken for some newer things especially wrt texturing, and both don't have
compliant, real msaa support). And add the extensions missing too (no
guarantee of completeness).
Reviewed-by: Dave Airlie <airlied@gmail.com>
* IEEE Std 1003.1-2001 placed strcasecmp() in strings.h.
* ISO C99 doesn't mention strcase* in string.h
* On all platforms I could find, strcasecmp is in strings.h and string.h
as a compatibility layer for software written pre-2001 POSIX
* Technically strcasecmp should be only in strings.h and the man
pages back this up.
* Tested build on CentOS and Haiku
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is never used. There is another token "OUTPUT" which the lexer can
generate, though. This has been around since the dawn of time; is most
likely a typo.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Now that qpu_inst() ignores the WADDR from the other half of the
instruction, we can set both the ADD and MUL WADDRs in the NOP helper.
Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
This allows setting the opposite-side WADDR to NOP (a non-zero value) in
qpu_* helpers, so that we don't need to qpu_inst() merge them with NOPs
all the time just to get the waddr set.
Fixes piglit draw-vertices and gl-2.0-vertexattribpointer on vc4, where
I'm only advertising R32F to RGBA32F support so far.
Note: regresses gl-1.5-normal3b3s-invariance due to introduced flushes and
missing depth buffer load/store support in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Individual caps made supporting new fallbacks more complicated than it
needed to be. Instead, just make a table of fallbacks at context init
time.
v2: Fix inverted "do we need to install vbuf?" flagging caught by Marek.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
When I made the shader cache take the .fs member and moved the binding
point to .bind_fs, I failed to update these. Fixes crashes in
copyteximage-related tests.
The patch fixes the build on Oracle Solaris.
CC os/os_misc.lo
"os/os_misc.c", line 59: #error: unexpected platform in os_sysinfo.c
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We've found that there's a buffer overrun bug in flex that's triggered by
using alternation in a lookahead pattern.
Fortunately, we don't need to match the exact {NEWLINE} expression to
detect an empty pragma. It suffices to verify that there are no non-space
characters before any newline character. So we can use a simple [\r\n] to
get the desired behavior while avoiding the flex bug.
Fixes the regression of piglit's 17000-consecutive-chars-identifier test,
(which has been crashing since commit
04e40fd337 ).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82472
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CC: <mesa-stable@lists.freedesktop.org>
The optimization relies on CMP setting the destination to 0, which is
equivalent to 0.0f. However, early platforms only set the least
significant byte, leaving the other bits undefined. So, we must disable
the optimization on those platforms.
Oddly, Sandybridge wasn't reported as broken. The PRM states that it
only sets the LSB, but the internal documentation says that it follows
the IVB behavior. Since it wasn't reported as broken, we believe it
really does follow the IVB behavior.
v2: Allow the optimization on Sandybridge (requested by Matt).
+32 piglits on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?=79963
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Operating on this code,
B0: ...
cmp.ne.f0(8)
(+f0) if(8)
B1: break(8)
B2: endif(8)
We can delete B2 without attempting to merge any blocks, since the
break/continue instruction necessarily ends the previous block.
After deleting the if instruction, we attempt to merge blocks B0 and B1.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This pass deletes an IF/ELSE/ENDIF or IF/ENDIF sequence, or the ELSE in
an ELSE/ENDIF sequence.
In the typical case (where IF and ENDIF) aren't the only instructions in
their basic blocks, we can simply remove the instructions (implicitly
deleting the block containing only the ELSE), and attempt to merge
blocks B0 and B2 together.
B0: ...
(+f0) if(8)
B1: else(8)
B2: endif(8)
...
If the IF or ENDIF instructions are the only instructions in their
respective basic blocks (which are deleted by the removal of the
instructions), we'll want to instead merge the next blocks.
Both B0 and B2 are possibly removed by the removal of if & endif.
Same situation for if/endif. E.g., in the following example we'd remove
blocks B1 and B2, and then attempt to combine B0 and B3.
B0: ...
B1: (+f0) if(8)
B2: endif(8)
B3: ...
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
... rather than pointing directly to the associated instruction. This
will let us set the block containing the IF statement's else-pointer to
NULL, when we delete a useless ELSE instruction, as in the case
(+f0) if(8)
...
else(8)
endif(8)
Also, remove the pointer to the ENDIF, since it's unused, and it was
also potentially wrong, in the case of a basic block containing both an
ENDIF and an IF instruction:
endif(8)
cmp.ne.f0(8) ...
(+f0) if(8)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.
cfg calculations: 202951 -> 80307 (-60.43%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Will let us avoid invalidating the CFG if the optimization pass has
removed instructions using the new basic block methods.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to
start running. glsl-fs-discard-02 has a problem where the first tile is
not getting stored on the first render.
This should improve performance on real hardware by allowing more shader
instances to run in parallel. It also fixes assertion failures in tests
that don't emit a fragment color, since otherwise we didn't have enough
instructions to fit our signals in.
The spec citation talked about A and B, and I proceeded to pay no
attention to whether the waddrs were for A or B. As a result, this pair
of instructions would claim to conflict:
mov ra4, ra4 ; nop nop, r0, r0
mov.ns ra4, rb4 ; nop nop, r0, r0
Now that tiling is in place, we can expose the other formats. Depth is
still broken (need to make changes in the shader), but if you don't expose
it things crash all over. SNORM is dropped, but we could re-add it later
with some shader fixes to handle converting between [0,1] and [-1,1].
This still treats everything as RGBA8888 for the most part, same as
before. This is a prerequisite for handling other texture formats, since
only RGBA8888 has a raster-layout mode.
It meant that LUMALPHA was being marked as *many* miplevels, and
unsurprisingly wouldn't validate. On the other hand, some miplevel counts
wouldn't get the small mips validated at all.
The header is used by DRI1 drivers, which we've removed a while
back. Now only the dri1 loader in libGL is using it, so let's
move it in src/glx, and prefix it accordingly.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This flag was set to true for the atomic counter intrinsics, but it
never got plumbed through the linker, so by the time it got to the
backends it would always be set to the false. The current i965 backend
code doesn't use is_intrinsic, so this should not change any existing
code, but it's useful for codepaths that want to distinguish between
intrinsics and non-intrinsics without using strcmp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
This captures all of the steps I have been following in making releases for
the past year or so. This way, the instructions should be sound for anyone who
would like to take over the release process going forward.
This change will double cache size for branches which have a lower
LP_MAX_SHADER_VARIANTS limit (it will not do anything on master).
The reason is that nowadays shaders tend to be quite a bit larger than they
were (they were big when llvmpipe didn't have a fs loop, got much smaller with
that loop, and since then have gradually increased quite a bit though still
smaller than without the fs loop for various reasons - among them being d3d10
compliance, usage of 8-wide vectors, non-swizzled blend code). Thus effectively
less shaders would be cached (unless they were very small and the variant limit
was hit first). Also, since we're getting rid of the IR nowadays, the cached
shaders shouldn't need all that much memory actually.
This captures the set of rules I have been using for stable-branch management,
(starting with a discussion on the mesa-dev mailing list on July 2013, and
then refined through my own experience of performing stable-branch releases
since then).
We switched to these several stable releases ago, (since the MD5 algorithm has
been broken for some time), but only now did I get around to fixing this in
the Makefile rather than just performing this step manually.
CC: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
v2:
- Move dri*_query_renderer_* into their respective dri*_priv.h headers
- Drop then unnneeded include of dri2.h from dri2_query_renderer.c
- Rename dri2_query_renderer.c as dri_common_query_renderer.c, as it's contents
now are used for more than dri[23]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82709">Bug 82709</a> - OpenCL not working on radeon hainan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82814">Bug 82814</a> - glDrawBuffers(0, NULL) segfaults in _mesa_drawbuffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83079">Bug 83079</a> - [NVC0] Dota 2 (Linux native and Wine) crash with Nouveau Drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>radeonsi: Don't use anonymous struct trick in atom tracking</li>
</ul>
<p>Alex Deucher (2):</p>
<ul>
<li>radeonsi: add new CIK pci ids</li>
<li>radeonsi: add new SI pci ids</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>winsys/radeon: fix nop packet padding for hawaii</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Bail on vec4 copy propagation for scratch writes with source modifiers</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix NULL pointer deref bug in _mesa_drawbuffers()</li>
</ul>
<p>Carl Worth (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.2.6 release</li>
<li>Makefile: Switch from md5sums to sha256sums</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>i965: add missing parens in vec4 visitor</li>
</ul>
<p>Emil Velikov (17):</p>
<ul>
<li>configure.ac: bail out if building gallium_gbm without gallium_egl</li>
<li>android: gallium/nouveau: fix include folders, link against libstlport</li>
<li>android: egl/main: fixup the nouveau build</li>
<li>automake: gallium/freedreno: drop spurious include dirs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77493">Bug 77493</a> - lp_test_arit fails with llvm >= llvm-3.5svn r206094</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84140">Bug 84140</a> - mplayer crashes playing some files using vdpau output</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84662">Bug 84662</a> - Long pauses with Unreal demo Elemental on R9270X since : Always flush the HDP cache before submitting a CS to the GPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85267">Bug 85267</a> - vlc crashes with vdpau (Radeon 3850HD) [r600]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
<li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.3.6 release</li>
<li>Update version to 10.3.7</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50,nvc0: set vertex id base to index_bias</li>
<li>nv50/ir: fix texture offsets in release builds</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
<li>i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>glsl_to_tgsi: fix a bug in copy propagation</li>
<li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87658">Bug 87658</a> - [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87913">Bug 87913</a> - CPU cacheline size of 0 can be returned by CPUID leaf 0x80000006 in some virtual machines</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
<li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>Revert "r600g/sb: fix issues cause by GLSL switching to loops for switch"</li>
<li>r600g: fix regression since UCMP change</li>
<li>r600g/sb: implement r600 gpr index workaround. (v3.1)</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.1 release</li>
<li>Update version to 10.4.2</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50,nvc0: set vertex id base to index_bias</li>
<li>nv50/ir: fix texture offsets in release builds</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
<li>i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.</li>
</ul>
<p>Leonid Shatz (1):</p>
<ul>
<li>gallium/util: make sure cache line size is not zero</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>glsl_to_tgsi: fix a bug in copy propagation</li>
<li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88662">Bug 88662</a> - unaligned access to gl_dlist_node</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88930">Bug 88930</a> - [osmesa] osbuffer->textures should be indexed by attachment type</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix display list 8-byte alignment issue</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.3 release</li>
<li>Update version to 10.4.4</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>egl: Pass the correct X visual depth to xcb_put_image().</li>
</ul>
<p>Mario Kleiner (1):</p>
<ul>
<li>glx/dri3: Request non-vsynced Present for swapinterval zero. (v3)</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>gallium/util: Don't use __builtin_clrsb in util_last_bit().</li>
</ul>
<p>Niels Ole Salscheider (1):</p>
<ul>
<li>configure: Link against all LLVM targets when building clover</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88658">Bug 88658</a> - (bisected) Slow video playback on Kabini</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
</ul>
<h2>Changes</h2>
<p>Carl Worth (1):</p>
<ul>
<li>Revert use of Mesa IR optimizer for ARB_fragment_programs</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.4 release</li>
<li>get-pick-list.sh: Require explicit "10.4" for nominating stable patches</li>
<li>Update version to 10.4.5</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: bail out of 2d blits with non-A8_UNORM alpha formats</li>
<li>st/mesa: treat resource-less xfb buffers as if they weren't there</li>
<li>nvc0: allow holes in xfb target lists</li>
</ul>
<p>Jeremy Huddleston Sequoia (2):</p>
<ul>
<li>darwin: build fix</li>
<li>darwin: build fix</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>i965: Override swizzles for integer luminance formats.</li>
<li>i965: Use a gl_color_union for sampler border color.</li>
<li>i965: Fix integer border color on Haswell.</li>
<li>glsl: Reduce memory consumption of copy propagation passes.</li>
</ul>
<p>Laura Ekstrand (1):</p>
<ul>
<li>main: Fixed _mesa_GetCompressedTexImage_sw to copy slices correctly.</li>
</ul>
<p>Marek Olšák (5):</p>
<ul>
<li>r600g,radeonsi: don't append to streamout buffers that haven't been used yet</li>
<li>radeonsi: fix instanced arrays with non-zero start instance</li>
<li>radeonsi: small fix in SPI state</li>
<li>mesa: fix AtomicBuffer typo in _mesa_DeleteBuffers</li>
<li>radeonsi: fix a crash if a stencil ref state is set before a DSA state</li>
</ul>
<p>Michel Dänzer (2):</p>
<ul>
<li>st/mesa: Don't use PIPE_USAGE_STREAM for GL_PIXEL_UNPACK_BUFFER_ARB</li>
<li>Revert "radeon/llvm: enable unsafe math for graphics shaders"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
</ul>
<h2>Changes</h2>
<p>Abdiel Janulgue (2):</p>
<ul>
<li>glsl: Don't optimize min/max into saturate when EmitNoSat is set</li>
<li>st/mesa: For vertex shaders, don't emit saturate when SM 3.0 is unsupported</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>glx: Fix returned values of GLX_RENDERER_PREFERRED_PROFILE_MESA</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>swrast: fix multiple color buffer writing</li>
<li>st/mesa: fix sampler view reference counting bug in glDraw/CopyPixels</li>
</ul>
<p>Chris Forbes (1):</p>
<ul>
<li>i965/gs: Check newly-generated GS-out VUE map against correct stage</li>
</ul>
<p>Eduardo Lima Mitev (1):</p>
<ul>
<li>mesa: Fix error validating args for TexSubImage3D</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.5 release</li>
<li>install-lib-links: remove the .install-lib-links file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89530">Bug 89530</a> - FTBFS in loader: missing fstat</li>
</ul>
<h2>Changes</h2>
<p>Andrey Sudnik (1):</p>
<ul>
<li>i965/vec4: Don't lose the saturate modifier in copy propagation.</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>egl: Take alpha bits into account when selecting GBM formats</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.6 release</li>
<li>cherry-ignore: add not applicable/rejected commits</li>
<li>mesa: rename format_info.c to format_info.h</li>
<li>loader: include <sys/stat.h> for non-sysfs builds</li>
<li>auxiliary/os: fix the android build - s/drm_munmap/os_munmap/</li>
<li>Update version to 10.4.7</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>i965: Fix out-of-bounds accesses into pull_constant_loc array</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>freedreno: move fb state copy after checking for size change</li>
<li>freedreno/ir3: fix array count returned by TXQ</li>
<li>freedreno/ir3: get the # of miplevels from getinfo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70410">Bug 70410</a> - egl-static/Makefile: linking fails with llvm >= 3.4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72685">Bug 72685</a> - [radeonsi hyperz] Artifacts in Unigine Sanctuary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72819">Bug 72819</a> - [855GM] Incorrect drop shadow color on windows and strange white rectangle when showing/hiding GLX-dock...</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74563">Bug 74563</a> - Surfaceless contexts are not properly released by DRI drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=75011">Bug 75011</a> - [hyperz] Performance drop since git-01e6371 (disable hyperz by default) with radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=75112">Bug 75112</a> - Meta Bug for HyperZ issues on r600g and radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=76252">Bug 76252</a> - Dynamic loading/unloading of opengl32.dll results in a deadlock</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=76861">Bug 76861</a> - mid3 generates slow code for constant arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77957">Bug 77957</a> - Variably-indexed constant arrays result in terrible shader code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78468">Bug 78468</a> - Compiling of shader gets stuck in infinite loop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79155">Bug 79155</a> - [Tesseract Game] Global Illumination: Medium Causes Color Distortion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79462">Bug 79462</a> - [NVC0/Codegen] Shader compilation falis in spill logic</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80050">Bug 80050</a> - [855GM] Incorrect drop shadow color under windows in Cinnamon persists with MESA 10.1.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80247">Bug 80247</a> - Khronos conformance test ES3-CTS.gtf.GL3Tests.transform_feedback.transform_feedback_vertex_id fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80561">Bug 80561</a> - Incorrect implementation of some VDPAU APIs.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82538">Bug 82538</a> - Super Maryo Chronicles fails with st/mesa assertion failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82796">Bug 82796</a> - [IVB/BYT-M/HSW/BDW Bisected]Synmark2_v6.0_OglTerrainFlyInst/OglTerrainPanInst cannot run as image validation failed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83148">Bug 83148</a> - Unity invisible under Ubuntu 14.04 and 14.10</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83380">Bug 83380</a> - Linking fails when not writing gl_Position.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83418">Bug 83418</a> - EU IV is incorrectly rendered after git1409011930.d571f2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84178">Bug 84178</a> - Big glamor regression in Xorg server 1.6.99.1 GIT: x11perf 1.5 Test: PutImage XY 500x500 Square</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84355">Bug 84355</a> - texture2DProjLod and textureCubeLod are not supported when using GLES.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84529">Bug 84529</a> - [IVB bisected] glean fragProg1 CMP test failed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84538">Bug 84538</a> - lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84557">Bug 84557</a> - [HSW] "Emit ELSE/ENDIF JIP with type D on Gen 7" causes Atomic Afterlife and GPU hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84651">Bug 84651</a> - Distorted graphics or black window when running Battle.net app on Intel hardware via wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84662">Bug 84662</a> - Long pauses with Unreal demo Elemental on R9270X since : Always flush the HDP cache before submitting a CS to the GPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84807">Bug 84807</a> - Build issue starting between bf4aecfb2acc8d0dc815105d2f36eccbc97c284b and a3e9582f09249ad27716ba82c7dfcee685b65d51</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85189">Bug 85189</a> - llvm/invocation.cpp: In function 'void {anonymous}::optimize(llvm::Module*, unsigned int, const std::vector<llvm::Function*>&)': llvm/invocation.cpp:324:18: error: expected type-specifier</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85267">Bug 85267</a> - vlc crashes with vdpau (Radeon 3850HD) [r600]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85377">Bug 85377</a> - lp_test_format failure with llvm-3.6</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85425">Bug 85425</a> - [bisected] Compiler error in clip control operations in meta</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85429">Bug 85429</a> - indirect.c:296: multiple definition of `__indirect_glNewList'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85454">Bug 85454</a> - Unigine Sanctuary with Wine crashes on Mesa Git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85647">Bug 85647</a> - Random radeonsi crashes with mesa 10.3.x</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85691">Bug 85691</a> - 'glsl: Drop constant 0.0 components from dot products.' broke piglit shaders/glsl-gnome-shell-dim-window and a few others with Gallium</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_framebuffer_sRGB on freedreno</li>
<li>GL_ARB_texture_rg on freedreno</li>
<li>GL_EXT_packed_float on freedreno</li>
<li>GL_EXT_polygon_offset_clamp on i965, nv50, nvc0, r600, radeonsi, llvmpipe</li>
<li>GL_EXT_texture_shared_exponent on freedreno</li>
<li>GL_EXT_texture_snorm on freedreno</li>
</ul>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=10370">Bug 10370</a> - Incorrect pixels read back if draw bitmap texture through Display list</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60879">Bug 60879</a> - [radeonsi] X11 can't start with acceleration enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67672">Bug 67672</a> - [llvmpipe] lp_test_arit fails on old CPUs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77544">Bug 77544</a> - i965: Try to use LINE instructions to perform MAD with immediate arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83510">Bug 83510</a> - Graphical glitches in Unreal Engine 4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83908">Bug 83908</a> - [i965] Incorrect icon colors in Steam Big Picture</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84212">Bug 84212</a> - [BSW]ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex fails and causes GPU hang</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84651">Bug 84651</a> - Distorted graphics or black window when running Battle.net app on Intel hardware via wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86837">Bug 86837</a> - kodi segfault since auxiliary/vl: rework the build of the VL code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86939">Bug 86939</a> - test_vf_float_conversions.cpp:63:12: error: expected primary-expression before ‘union’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86944">Bug 86944</a> - glsl_parser_extras.cpp", line 1455: Error: Badly formed expression. (Oracle Studio)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86958">Bug 86958</a> - lp_bld_misc.cpp:503:40: error: no matching function for call to ‘llvm::EngineBuilder::setMCJITMemoryManager(ShaderMemoryManager*&)’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86969">Bug 86969</a> - _drm_intel_gem_bo_references() function takes half the CPU with Witcher2 game</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87076">Bug 87076</a> - Dead Island needs allow_glsl_extension_directive_midshader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87658">Bug 87658</a> - [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87694">Bug 87694</a> - [SNB] Crash in brw_begin_transform_feedback</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87886">Bug 87886</a> - constant fps drops with Intel and Radeon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87913">Bug 87913</a> - CPU cacheline size of 0 can be returned by CPUID leaf 0x80000006 in some virtual machines</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88079">Bug 88079</a> - dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0 tests fail due to enabling of GL_RGB and GL_RGBA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88219">Bug 88219</a> - include/c11/threads_posix.h:197: undefined reference to `pthread_mutex_lock'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88227">Bug 88227</a> - Radeonsi: High GTT usage in Prison Architect large map</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88248">Bug 88248</a> - Calling glClear while there is an occlusion query in progress messes up the results</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88467">Bug 88467</a> - nir.c:140: error: ‘nir_src’ has no member named ‘ssa’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88478">Bug 88478</a> - #error "<malloc.h> has been replaced by <stdlib.h>"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88519">Bug 88519</a> - sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88523">Bug 88523</a> - sha1.c:37: error: 'SHA1_CTX' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88561">Bug 88561</a> - [radeonsi][regression,bisected] Depth test/buffer issues in Portal</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88658">Bug 88658</a> - (bisected) Slow video playback on Kabini</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88662">Bug 88662</a> - unaligned access to gl_dlist_node</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88783">Bug 88783</a> - FTBFS: Clover: src/gallium/state_trackers/clover/llvm/invocation.cpp:335:49: error: no matching function for call to 'llvm::TargetLibraryInfo::TargetLibraryInfo(llvm::Triple)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88806">Bug 88806</a> - nir/nir_constant_expressions.c:2754:15: error: controlling expression type 'unsigned int' not compatible with any generic association type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88930">Bug 88930</a> - [osmesa] osbuffer->textures should be indexed by attachment type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88962">Bug 88962</a> - [osmesa] Crash on postprocessing if z buffer is NULL</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89068">Bug 89068</a> - glTexImage2D regression by texstore_rgba switch to _mesa_format_convert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86747">Bug 86747</a> - Noise in Football Manager 2014 textures</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86974">Bug 86974</a> - INTEL_DEBUG=shader_time always asserts in fs_generator::generate_code() when Mesa is built with --enable-debug (= with asserts)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88246">Bug 88246</a> - Commit 2881b12 causes 43 DrawElements test regressions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88883">Bug 88883</a> - ir-a2xx.c: variable changed in assert statement</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89292">Bug 89292</a> - [regression,bisected] incomplete screenshots in some cases</li>
/* hope the best instead of adding a bunch of ifdef's */
# include <stdint.h>
#endif
/* XXX: Use standard `inline` keyword instead */
#ifndef INLINE
# define INLINE inline
#endif
/**
* Function visibility
*/
#ifndef PUBLIC
# if defined(__GNUC__) || (defined(__SUNPRO_C) && (__SUNPRO_C >= 0x590))
# define PUBLIC __attribute__((visibility("default")))
# elif defined(_MSC_VER)
# define PUBLIC __declspec(dllexport)
# else
# define PUBLIC
# endif
#endif
/* XXX: Use standard `__func__` instead */
#ifndef __FUNCTION__
# define __FUNCTION__ __func__
#endif
#define STATIC_ASSERT(COND) \
do { \
(void) sizeof(char [1 - 2*!(COND)]); \
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.