We've started using NirOptions != NULL to mean "we're using NIR for this
stage." However, when INTEL_USE_NIR=1, we set it for a bunch of stages
that still use the vec4 backend, and thus definitely aren't using NIR.
For example, if INTEL_USE_NIR=1 we disable the GLSL IR cubemap
normalization pass, even for vertex shaders and geometry shaders. This
is wrong, but breaks a very uncommon case.
When I started deleting GLSL IR for stages where we claimed to be using
NIR, this bug quickly became apparent.
For now, only set it for fragment shaders, and vertex shaders if
brw->scalar_vs is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The disassembly currently has the swizzle after the type for 3src source
operands, and the other way around for 2src. Flip the type and swizzle
around for 3src so that the output matches 2src.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
These should never happen. Plus, NIR passes really shouldn't be
reporting linker errors - this is past link time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't actually need a gl_program struct. We only used it to
translate prog->Target (i.e. GL_VERTEX_PROGRAM) to the gl_shader_stage
(i.e. MESA_SHADER_VERTEX). We may as well just pass that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I want to use this in some code that doesn't currently include mtypes.h.
It seems like a better place for it anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This header was originally going to be called pipeline.h, but it got
renamed at the last minute. Make the include guards match.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Neither the shader nor the key change when doing elts or linear variant, so
this was just annoying (probably mildly useful at some point when we printed
the IR per function too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
llvm goes crazy when doing that, using way more memory and time, though there's
probably more to it - this points to a very much similar issue as fixed in
8a9f5ecdb1. In any case I've seen a quite
plain looking vertex shader with just ~50 simple tgsi instructions (but with a
dozen or so such indirect constant buffer lookups) go from a terribly high
~440ms compile time (consuming 25MB of memory in the process) down to a still
awful ~230ms and 13MB with this fix (with llvm 3.3), so there's still obvious
improvements possible (but I have no clue why it's so slow...).
The resulting shader is most likely also faster (certainly seemed so though
I don't have any hard numbers as it may have been influenced by compile times)
since generally fetching constants outside the buffer range is most likely an
app error (that is we expect all indices to be valid).
It is possible this fixes some mysterious vertex shader slowdowns we've seen
ever since we are conforming to newer apis at least partially (the main draw
loop also has similar looking conditionals which we probably could do without -
if not for the fetch at least for the additional elts condition.)
v2: use static vars for the fake bufs, minor code cleanups
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a follow-on fix from the earlier "glsl: allow ForceGLSLVersion
to override #version directives" change. Since we're not changing
the language_version field, we have to check forced_language_version
here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In Skylake the order of the arguments for sample messages with the LD
type are u, v, lod, r whereas previously they were u, lod, v, r.
This fixes 144 Piglit tests including ones that directly use
texelFetch and also some using the meta stencil blit path which
appears to use texelFetch in its shader.
v2: Fix sampling 1D textures
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
On Gen7/8 for RAW surface format, the depth field (surf[3]) in surface
state means [30:21] bits of number of entries which is different from
other surface format which uses [26:21] bits field.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.
Mark emit instructions as unkillable so DCE doesn't eat them.
Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.
Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
arb_stencil_texturing-draw failed under softpipe because we got a float
back from the texturing function, and then tried to U2F it, stencil
texturing returns ints, so we should fix the tiling to retrieve
the stencil values as integers not floats.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This pass performs a mark and sweep pass over a nir_shader's associated
memory - anything still connected to the program will be kept, and any
dead memory we dropped on the floor will be freed.
The expectation is that this will be called when finished building and
optimizing the shader. However, it's also fine to call it earlier, and
many times, to free up memory earlier.
v2: (feedback from Jason Ekstrand)
- Skip sweeping impl->start_block, as it's already in the CF list.
- Don't sweep SSA defs (they're owned by their defining instruction)
- Don't steal phi sources (they're owned by nir_phi_instr).
- Don't steal tex->src (it's owned by the tex_inst itself)
- Don't sweep dereference chains (top-level dereferences are owned by
the instruction; sub-dereferences are owned by the parent deref).
- Don't sweep sources and destinations (SSA defs are handled as part of
the defining instruction, and registers are handled as part of
function implementations).
- Just steal instructions; don't walk them (no longer required).
v3: (feedback from Jason Ekstrand)
- Steal indirect sources from nir_src/nir_dest.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason pointed out that variable dereferences in NIR are really part of
their parent instruction, and should have the same lifetime.
Unlike in GLSL IR, they're not used very often - just for intrinsic
variables, call parameters & return, and indirect samplers for
texturing. Also, nir_deref_var is the top-level concept, and
nir_deref_array/nir_deref_record are child nodes.
This patch attempts to allocate nir_deref_vars out of their parent
instruction, and any sub-dereferences out of their parent deref.
It enforces these restrictions in the validator as well.
This means that freeing an instruction should free its associated
dereference chain as well. The memory sweeper pass can also happily
ignore them.
v2: Rename make_deref to evaluate_deref and make it take a nir_instr *
instead of void *. This involves adding &instr->instr everywhere.
(Requested by Jason Ekstrand.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can't allocate them out of the nir_ssa_def itself, because it may not
be ralloc'd (for example, nir_dest embeds a nir_ssa_def).
However, allocating them out of the instruction should work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Phi sources are part of the phi instruction and should have the same
lifetime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of the params array needs to be match the nir_call_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 18004c3 introduced more restrictive validation to linker
between inputs and outputs. This patch skips the additional check
for programs that utilize GL_ARB_separate_shader_objects, there
inputs and outputs might not make exact match during linking but
only when constructing the final pipeline.
This made some of the GL_ARB_program_interface_query tests shaders
fail to link, these tests can be used to verify the change.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We limit y-tiling to 0x20 when depth is involved. However the function is
run for each miplevel, and the hardware expects miplevel 0 to have the
highest tiling settings. Perform the y-tiling limit on all levels of a
3d texture, not just the ones that have depth.
Fixes:
texelFetch fs sampler3D 98x129x1-98x129x9
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nick Tenney <nick.tenney@gmail.com> # GT216
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Haswell hardware seems to ignore Render Stream Select bits from
3DSTATE_STREAMOUT packet when the SOL stage is disabled even if
the PRM says otherwise. Because of this, all primitives are sent
down the pipeline for rasterization, which is wrong. If SOL is
enabled, Render Stream Select is honored and primitives bound to
non-zero streams are discarded after stream output.
Since the only purpose of primives sent to non-zero streams is to
be recorded by transform feedback, we can simply discard all geometry
bound to non-zero streams then transform feedback is disabled
to prevent it from ever reaching the rasterization stage.
Notice that this patch introduces a small change in the behavior we
get when a geometry shader emits more vertices than the maximum declared:
before, a vertex that was emitted to a non-zero stream when TF was
disabled would still count for the purposes of checking that we don't
exceed the maximum number of output vertices declared by the shader. With
this change, these vertices are completely ignored and won't increase
the output vertex count, making more room for other (hopefully more
useful) vertices.
Fixes piglit test arb_gpu_shader5-emitstreamvertex_nodraw on Haswell
and Broadwell.
v2 (Ken): Drop is_haswell check in favor of doing this unconditionally.
Broadwell needs the workaround as well, and it doesn't hurt to do it in
general. Also tweak comments - the Haswell PRM does actually mention
this ("Command Reference: Instructions" page 797).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83962
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Jordan added this in commit 741782b594 for
Gen7 platforms. I missed this when adding the Broadwell code.
Fixes Piglit's spec/arb_gpu_shader5/invocation-id-{basic,in-separate-gs}
with MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5 set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
While working on NIR's memory allocation model, I realized the GLSL IR
memory model was broken.
During glCompileShader, we allocate everything out of the
_mesa_glsl_parse_state context, and reparent it to gl_shader at the end.
During glLinkProgram, we allocate everything out of a temporary context,
then reparent it to the exec_list containing the linked IR.
But during brw_link_shader - the driver's final opportunity to do
lowering and optimization - we just allocated everything out of the
permanent context given to us by the linker. That memory stayed
forever.
Notably, passes like brw_fs_channel_expressions cause us to churn the
majority of the code, so we really want to free dead IR here.
Saves 125MB of memory when replaying a Dota 2 trace on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This allows SIMD16 mode to work for a lot more programs. Texturing is
also more efficient in SIMD16 mode than SIMD8. Several messages don't
actually exist in SIMD8 mode, so we did SIMD16 messages and threw away
half of the data. Now we compute real data in both halves.
Also, the SIMD16 "sample" message doesn't require all three coordinate
components to exist (like the SIMD8 one), so we can shorten the message
lengths, cutting register usage a bit.
I chose to implement the visitor functionality in a separate function,
since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks
seemed like a mess. The new code bails on a few cases where we'd
have to do two SIMD8 messages - we just fall back to SIMD8 for now.
Improves performance in "Shadowrun: Dragonfall - Director's Cut" by
about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around
in the first mission).
v2: Add ir_txf to the has_lod case (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen5+ systems allow you to specify multiple shader programs - both SIMD8
and SIMD16 - and the hardware will automatically dispatch to the most
appropriate one, given the number of subspans to be processed.
However, that is not the case on Gen4. Instead, you program a single
shader. If you enable multiple dispatch modes (SIMD8 and SIMD16), the
shader is supposed to contain a series of jump instructions at the
beginning. The hardware will launch the shader at a small offset,
hitting one of the jumps.
We've always thought that sounds like a pain, and weren't clear how it
affected performance - is it worth having multiple shader types? So,
we never bothered with SIMD16 until now.
This patch takes a simpler approach: try and compile a SIMD16 shader.
If possible, set the no_8 flag, telling the hardware to just use the
SIMD16 variant all the time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This flag means to ignore the SIMD8 program and only use the SIMD16 one.
It was originally meant for repdata clear shaders, but I plan to use it
for other things on Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I've no idea why this was 4. It certainly seems wrong.
Prevents assertion failures in fp-incomplete-tex with some upcoming
patches of mine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The CSE algorithm will continuously allocate new ae_entry objects. As
each new basic block is exited, all of the previously allocated objects
are dumped. Instead, put them in a free list and re-use them in the
next basic block. Reduce, reuse, recycle!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
These were added in commit f2616e56, presumably in preparation for
translating ARB vp/fp into GLSL IR. That never happened, and neither did
a lowering pass that actually generated these instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
From GLSL 3.30 and GLSL ES 1.00 on, after processing the line
directive (including its new-line), the implementation should
behave as if it is compiling at the line number passed as
argument. In previous versions, it behaved as if compiling
at the passed line number + 1.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=88815
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From GLSL 1.30.10, section 3.3 (Preprocessor):
"#line line source-string-number ... After processing this directive
(including its new-line), the implementation will behave as if it is
compiling at ... source string number source-string-number. Subsequent
source strings will be numbered sequentially, until another #line
directive overrides that numbering."
In the previous implementation the source number was always zero.
Subsequent source strings are still not numbered sequentially, because
in the glShaderSource implementation we are concatenating the source code
strings into one long string.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=88815
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even if they only have one slice, otherwise textureSize() won't
produce correct results for the depth value.
Fixes 10 dEQP tests in this category:
dEQP-GLES3.functional.shaders.texture_functions.texturesize.sampler2darray*
Reviewed-by: Mark Janes <mark.a.janes at intel.com>
The NIR compiler frontend is an alternative to the TGSI f/e, producing
the same ir3 IR and using the same backend passes for scheduling, etc.
It is not enabled by default yet, as there are still some regressions.
To enable, use 'FD_MESA_DEBUG=nir'. It is enough to use with, for
example, xonotic or supertuxkart.
With the NIR f/e, scalarizing and a number of other lowering steps
happen in NIR, so we don't have to do them in ir3. Which simplifies the
f/e and allows the lowered instructions to pass through other
optimization stages.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use the correct sprite replacement depending on the flip of the coord
mode, using either T or 1-T depending on whether we have an upper-left or
lower-left coordinate origin. This fixes all the point sprite piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Copies nouveau_buffer and radeon_buffer. This allows a write to proceed
to an uninitialized part of a buffer even when the GPU is using the
previously-initialized portions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Waiting on a bo being ready is handled in fd_bo_cpu_prep. No need to
keep separate timestamps around.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A resource flush is an upload of a hypothetically-staging texture to the
GPU. For a UMA system, this will largely be a no-op or
cache-maintenance. Move the render flush logic into transfer_map where
it belongs, and clear out the transfer_flush function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
pipe_sampler_view already contains a texture, remove the redundant
tex_resource member which pointed at the same thing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since NIR f/e currently encodes immediates in instructions (rather than
passing via const), we need to ensure that when const's are used the get
initialized to the proper values. Otherwise comparing NIR to TGSI
compiler, it will use proper immediate values in one case, and randomly
initialize values in the other. Which confuses ir3test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Be smarter about propagating copies from const or immed, or with abs/neg
modifiers. Also, realize that absneg.s and absneg.f are really "fancy"
mov instructions.
This opens up the possibility to remove more copies. It helps the TGSI
frontend a bit, but will be really needed for the NIR f/e which builds
everything up in SSA form (ie. will *always* insert a mov from const or
immediate).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Even though in the end, they map to the same bits, the backend will need
to be able to differentiate float abs/neg vs integer abs/neg. Rather
than making the backend figure it out based on instruction opcode (which
when combined with mov/absneg instructions, can be awkward), just split
out different flags for each so the frontend can signal it's intentions
more clearly. Also, since (neg) for bitwise op's is actually a bitwise-
not, split it out into bnot flag.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add helpers for constructing SSA forms of instructions.
Only partial cat5/cat6 coverage.. but we can add stuff as needed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to pull in libnir.la and it's dependency libglsl_util.la. Also,
_mesa_error_no_memory() must be defined.
Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't
also need libstdc++.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If we want to use NIR from state trackers that don't already pull in the
whole of glsl (ie. anything other than mesa state tracker), we need a
separate more minimal libnir. Possibly NIR should be better split out
from glsl, but for now, generate a second smaller libnir.la for those
who just want NIR but not all of glsl.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD().
See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an
adaptation of the nv50 code from Ilia Mirkin).
A python/numpy script which implements the same algorithm (and is
possibly useful for debugging or analysis) can be found here:
http://people.freedesktop.org/~robclark/div-lowering.py
I've tested this on i965 hacked up to insert the idiv lowering pass,
and on freedreno with NIR frontend.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Eric Anholt <eric@anholt.net> (vc4)
In freedreno these get implemented as the matching f* instruction plus a
u2f to convert the result to float 1.0/0.0. But less lines of code to
just let nir_opt_algebraic handle this for us, plus opens up some small
window for other opt passes to improve (ie. if some shader ended up with
both a flt and slt with same src args, for example).
v2: use b2f rather than u2f
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Switch between the two clip space definitions already available
in hardware. Update winding order dependent state according
to the clip control state.
This change did not introduce new piglit quick.test regressions on
an Ivybridge Mobile and a GM45 Express chipset.
Also it enables and passes the clip-control and clip-control-depth-precision
tests on these two chipsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
The _WindowMap can be dropped from gl_viewport_attrib now.
Simplify gl_viewport_attrib handling where possible.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is the only real user of _WindowMap which has the depth
buffer scaling multiplied in. Maintain the _WindowMap of the
one and only viewport inside TNLcontext.
v2:
Remove unneeded parentheses.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Instead of _WindowMap just use the translation and scale
of the viewport transform directly. Thereby avoid dividing by
_DepthMaxF again.
v2:
Change order of assignments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Instead of _WindowMap just use the translation and scale
of the viewport transform directly. Thereby avoid dividing by
_DepthMaxF again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
As of da5ec2a, we allocate instruction sources out of the instruction
itself. When we realloc the texture sources we need to use the right
memory context or ralloc will get angry and assert-fail
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a pass to L1-normalize cube-map coordinates. Some hardware
such as i965 requires that largest cube-map coordinate is +-1. We had a
pass to perform this normalization in GLSL IR but we need it in NIR for
cube maps on ARB programs to work correctly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2 (Suggested by Eric):
- Do a vector fabs and split into components later
- Move to core NIR
Reviewed-by: Eric Anholt <eric@anholt.net>
This only impacts the ARB_fp path. We can't quite disable the GLSL-level
lowering pass, because it needs to apply before
brw_do_lower_unnormalized_offset().
total instructions in shared programs: 5667857 -> 5667847 (-0.00%)
instructions in affected programs: 1114 -> 1104 (-0.90%)
helped: 16
HURT: 6
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Not much hardware wants them these days, and it might give us a chance to
do CSE or algebraic at the NIR level.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We use nir_ssa_defs for nir_builder args, so this takes a nir_src and
makes one so it can be passed in.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So far we'd only used nir_builder to build brand new programs. But if
we're doing modifications to instructions (like in a lowering pass), then
we want to generate new stuff before the instruction we're modifying.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The fpclassify stuff either needs std=c99 or _XOPEN_SOURCE=600 passed
to gcc, but when using the latter the lrint family of function will be defined
too.
This is in preparation for these functions to be called from other
files.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dirty-bit checking from each brw_upload_<stage>_prog function is
split out into its a new brw_<stage>_state_dirty function.
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit splits portions of the existing brw_upload_vs_prog and
brw_upload_gs_prog function into new brw_vs_populate_key and
brw_gs_populate_key functions. This follows the same style as is
already present for all other stages, (see brw_wm_populate_key, etc.).
This commit is intended to have no functional change. It exists in
preparation for some upcoming code movement in preparation for the
shader cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 09ee907266 added logic to fold immediates into mad operations,
but the emission code is only there for fmad. Only allow it on float
types.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit fb63df2215 added 4-byte mad support, but only supported
emission for floats. Disable it for ints for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The lifetime of the sources array needs to be match the nir_tex_instr
itself. So, allocate it using the instruction itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These sets are part of the block, and their lifetime needs to match the
block itself. So, allocate them using the block itself as the context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The lifetime of each register's use/def/if_use sets needs to match the
register itself. So, allocate them using the register itself as the
context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir passes in the ir_function's name field; we were copying the
pointer, but not duplicating the memory.
We want to be able to free the linked GLSL IR program after translating
to NIR, so we'll need to create a copy of the function name that the NIR
shader actually owns.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can just pass a pointer to the list of variables, and reuse the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
ralloc_adopt() reparents all children from one context to another.
Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this
pseudocode:
foreach child of old_ctx:
ralloc_steal(new_ctx, child)
However, ralloc provides no way to iterate over a memory context's
children, and ralloc_adopt does this task more efficiently anyway.
One potential use of this is to implement a memory-sweeper pass: first,
steal all of a context's memory to a temporary context. Then, walk over
anything that should be kept, and ralloc_steal it back to the original
context. Finally, free the temporary context. This works when the
context is something that can't be freed (i.e. an important structure).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.
total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs: 117192 -> 116359 (-0.71%)
helped: 471
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.
Reviewed-by: Eric Anholt <eric@anholt.net>
Doesn't work for analogous && cases, because of NaNs.
total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs: 42000 -> 41117 (-2.10%)
helped: 403
Reviewed-by: Eric Anholt <eric@anholt.net>
InputsRead is a 64-bit bitfield. Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.
Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32. Switch back to using < now that
the underlying problem is fixed.
Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering. Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem. We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs: 989359 -> 978057 (-1.14%)
helped: 5308
HURT: 97
GAINED: 78
LOST: 5
This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program. However, the damage is caused by changing a single
instruction from a ffma to an add. This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill. Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs: 4230 -> 4286 (1.32%)
helped: 0
HURT: 12
While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions. To solve this, we play the
age-old header file trick and just #define around it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw. This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.
Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.
v2: Use the correct variable name in the comments
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In case of using a distribution tarball (or a dirty git tree) one can
have the generated sources locally. Make configure.ac error out
otherwise, to alert that about the unmet requirement(s) of python/mako.
v2: Check only for a single file for each dependency.
Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value. Only, in NIR we want a proper
bool once again, so we compare with 0. This is a lot of pointless extra
instructions.
total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs: 1342 -> 1309 (-2.46%)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand. This is common when generating NIR from TGSI.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.
total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs: 62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs: 15494 -> 15240 (-1.64%)
v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
I'm not committing)
This will be used by the VC4 driver for doing device-independent
optimization, and hopefully eventually replacing its whole IR. It also
may be useful to other drivers for the same reason.
v2: Add all of the instructions I was relying on tgsi_lowering to remove,
and more.
v3: Rebase on SSA rework of the builder.
v4: Use the NIR ineg operation instead of doing a src modifier.
v5: Don't use ineg for fnegs. (infer_src_type on MOV doesn't do what I
expect, again).
v6: Fix handling of multi-channel KILL_IF sources.
v7: Make ttn_get_f() return a swizzle of a scalar load_const, rather than
a vector load_const. CSE doesn't recognize that srcs out of those
channels are actually all the same.
v8: Rebase on nir_builder auto-sizing, make the scalar arguments to
non-ALU instructions actually be scalars.
v9: Add support for if/loop instructions, additional texture targets, and
untested support for indirect addressing on temps.
v10: Rebase on master, drop bad comment about control flow and just choose
the X channel, use int comparison opcodes in LIT for now, drop unused
pipe_context argument..
v11: Fix translation of LRP (previously missed because I mis-translated
back out), use nir_builder init helpers.
v12: Rebase on master, adding explicit include of mtypes.h to get
INTERP_QUALIFIER_*
v13: Rebase on variables being in lists instead of hash tables, drop use
of mtypes.h in favor of util/pipeline.h. Use Ken's nir_builder
swizzle and fmov/imov_alu helpers, drop "struct" in front of
nir_builder, use nir_builder directly as the function arg in a lot of
cases, drop redundant members of ttn_compile that are also in
nir_builder, drop some half-baked malloc failure handling.
v14: The indirect uniform src0 should be scalar, not vector (noticed as
odd by robclark, confirmed by cwabbott). Apply Ken's review to
initialize s->num_uniforms and friends, skip ttn_channel for dot
products, and use the simpler discard_if intrinsic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v13)
Acked-by: Rob Clark <robclark@freedesktop.org>
NIR uses these enums/#defines in nir_variables and associated intrinsics,
but I want to be able to use them from TGSI->NIR and NIR->TGSI.
Otherwise, we had to pull in all of mtypes.h.
This doesn't cover all of the enums we might want from a shared compiler
core (like varying slots or vert attribs), but it at least covers what I
need at the moment (system values and interp qualifiers).
v2: Move to src/glsl since util/ is really vague. Include in Makefile.am
list. Use plain bitshifts and stdint types instead of undefined
BITFIELD64_BIT.
v3: Rename to shader_enums.h. Move it into Makefile.sources.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2, with
recommendation to rename)
The header was added with commit 2a135c470e3(nir: Add an ALU op builder
kind of like ir_builder.h) but did not made it into to the sources list.
Fortunately it remained unused until a recent commit faf6106c6f6(nir:
Implement a Mesa IR -> NIR translator.)
v2: Remove the bogus dependency. Tweak commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
... by folding it into CLEANFILES. Don't worry about $(LANG) as it is
essentially the first folder of $(POS). With the latter already handled.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both of which were removed with commit 69db422218b(scons: Don't build
osmesa.)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is a problem when we have IR like this:
(array_ref (var_ref temps) (swiz x (expression ivec4 bitcast_f2i
(swiz xxxx (array_ref (var_ref temps) (constant int (2)) ) )) )) ) )
where we are indexing an array with the result of an expression that
accesses the same array.
In this scenario, temps will be moved to scratch space and we will need
to add scratch reads/writes for all accesses to temps, however, the
current implementation does not consider the case where a reladdr pointer
(obtained by indexing into temps trough a expression) points to a register
that is also stored in scratch space (as in this case, where the expression
used to index temps access temps[2]), and thus, requires a scratch read
before it is accessed.
v2 (Francisco Jerez):
- Handle also recursive reladdr addressing.
- Do not memcpy dst_reg into src_reg when rewriting reladdr.
v3 (Francisco Jerez):
- Reduce complexity by moving recursive reladdr scratch access handling
to a separate recursive function.
- Do not skip demoting reladdr index registers to scratch space if the
top level GRF has already been visited.
v4 (Francisco Jerez)
- Remove redundant checks.
- Simplify code by making emit_resolve_reladdr return a register with
the original src data except for reg, reg_offset and reladdr.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89508
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This mutex is used to make sure the shared context does not change
while some shared code is looking into it.
Calling BindRenderbufferEXT BindRenderbuffer with a gles context
would not take the mutex before allocating an entry. Commit a34669b
then moved out the allocation out of bind_renderbuffer into
allocate_renderbuffer before using it for the CreateRenderBuffer
entry point. This thus also made this entry point unsafe.
The issue has been hinted by Ilia Mirkin.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
The issue has been detected by coverty.
v2:
- move the declaration of obj to the else clause (Brian Paul)
v3: Review by Brian Paul
- get rid of the obj declaration in favor of a direct reference
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
At the moment to get an EGL image to a dma-buf file descriptor,
you have to use EGL_MESA_drm_image, and then use libdrm to
convert this to a file descriptor.
This extension just provides an API modelled on EGL_MESA_drm_image,
to return a dma-buf file descriptor.
v2: update spec for new API proposal
add internal queries to get the fourcc back from intel driver.
v2.1: add gallium pieces.
v2.2: add offsets to spec and API, rename fd->fds, stride->strides
in API. rewrite spec a bit more, add some q/a
v2.3:
add modifiers to query interface and 64-bit type for that (Daniel Stone)
specifiy what happens to num fds vs num planes differences. (Chad Versace)
v2.4:
fix grammar (Daniel Stone)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now, we only use brw->NewGLState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/brw->state\.dirty\.mesa/brw->NewGLState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now, we only use ctx->NewDriverState.
I used this bash & sed command in the i965 directory:
for file in *.[ch] *.[ch]pp; do
sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file
done
Followed by manual changes to brw_state_upload.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When clearing the state for a pipeline, we will save changed state for
the other pipelines.
v3:
* Adjust brw_upload_pipeline_state
* Don't pull pipeline state bits into common state bits
* Don't clear pipeline state bits
* Adjust 'clear' phase
* brw_clear_dirty_bits is now brw_render_state_finished
* Move cross-pipeline state flagging to brw_pipeline_state_finished
* Move pipeline clears to brw_pipeline_state_finished
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw->num_atoms is converted to an array, but currently just an array
of length 1.
Adds brw_copy_pipeline_atoms which copies the atoms for a pipeline,
and sets brw->num_atoms[p] for pipeline p.
v2:
* Rename brw->atoms[] to render_atoms
* Rename brw_add_pipeline_atoms to brw_copy_pipeline_atoms
* Rename brw_pipeline_first_atom to brw_get_pipeline_atoms
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've seen some cases where performance can hurt quite a bit.
Technically, the more simple the function the more overhead there is
for using a function for this (and the less benefits this provides).
Hence don't do this if we expect the generated code to be simple.
There's an even more important reason why this hurts performance,
which is shaders reusing the same unit with some of the same inputs,
as llvm cannot figure out the calculations are the same if they
are performned in the function (even just reusing the same unit without
any input being the same provides such optimization opportunities though
not very much). This is something which would need to be handled by IPO
passes however.
mul x, -y is equivalent to mul -x, y; and mul x, y is the negation of
mul x, -y.
With NIR:
total instructions in shared programs: 6167779 -> 6161193 (-0.11%)
instructions in affected programs: 983511 -> 976925 (-0.67%)
helped: 4106
HURT: 16
GAINED: 18
LOST: 7
Without NIR:
total instructions in shared programs: 6192323 -> 6185299 (-0.11%)
instructions in affected programs: 987875 -> 980851 (-0.71%)
helped: 4146
HURT: 16
GAINED: 16
LOST: 0
The typical case of mat4*mat4*vec4 is 80 scalar multiplications, but
mat4*(mat4*vec4) is only 32.
On HSW (with vec4 vertex shaders):
instructions in affected programs: 4420 -> 3194 (-27.74%)
On BDW (with scalar vertex shaders):
instructions in affected programs: 12756 -> 6726 (-47.27%)
Implementing a general matrix chain ordering is harder (or at least
tedious) because of having to walk the GLSL IR to create a list of
multiplicands. I'm guessing that this patch handles 90+% of cases, but
of course to tell definitively you'd have to implement the general
thing.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When nvc0_push_vbo calls nouveau_scratch_done it does not mean
scratch buffers can be freed immediately. It means "when hardware
advances to this place in the command stream the scratch buffers
can be freed".
To fix it, just postpone scratch runout destruction after current
fence is signalled.
The bug existed for a very long time. Nobody noticed, because
"scratch runout" code path is rarely executed.
Fixes hang at the very beginning of first mission in "Serious Sam 3"
on nve7/gk107. It manifested as:
nouveau E[ PFIFO][0000:01:00.0] read fault at 0x000a9e0000 [PTE] from GR/GPC0/PE_2 on channel 0x007f853000 [Sam3[17056]]
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Commit cf67ca9ffa made the layouting code pick a special layout for
1D images on Skylake. This should not be used for depth and stencil
buffers because these need to be treated as 2D tiled images. However
the patch was missing a check for images with a base format of
GL_STENCIL_INDEX. In practice I don't think it's currently possible to
hit this because Mesa doesn't support GL_ARB_texture_stencil8 and it's
not possible to create a 1D renderbuffer, but it'll be good to be
ready for when the extension is supported.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will help encoding VUI into the bitstream
v2: make backward compatible
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The framerate will be used for video usability info support by VCE driver
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Just announce support for 4 components.
While here also increase the max/min texel offsets (the limit is completely
artificial, was chosen because that's what other hardware did, however there's
other drivers using larger limits).
Over a thousand little piglits skip->pass.
v2: update docs/GL3.txt
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is quite trivial, essentially just follow all the same code you'd
use with linear min/mag (and no mip) filter, then just skip the filtering
after looking up the texels in favor of direct assignment of the right channel
to the result. (This is though not true for the multi-offset version if we'd
want to support it - for this would probably need to do something along the
lines of 4x nearest sampling due to the necessity of doing coord wrapping
individually per texel.)
Supports multi-channel formats.
From the SM5 gather cap bit, should support non-constant offsets, plus shadow
comparisons (the former untested), but not component selection (should be
easy to implement but all this stuff is not really exposable anyway for now).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has got a bit out of control with more and more parameters added.
Worse, whenever something in there changes all callees have to be updated
for that, even though they don't really do much with any parameter in there
except pass it on to the actual sampling function.
Hence simply put almost everything into a struct. Also instead of relying
on some arguments being NULL, be explicit and set this in a key (which is
just reused for function generation for simplicity). (The code still relies
on them being NULL in the end for now.)
Technically there is a minimal functional change here for shadow sampling:
if shadow sampling is done is now determined explicitly by the texture
function (either sample_c or the gl-style tex func inherit this from target)
instead of the static texture state. These two should always match, however.
Otherwise, it should generate all the same code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This cleans up more instructions generated by uniform array indexing
multiplies.
total instructions in shared programs: 39989 -> 39961 (-0.07%)
instructions in affected programs: 896 -> 868 (-3.12%)
This cleans up some pointless operations generated by the in-driver mul24
lowering (commonly generated by making a vec4 index for a matrix in a
uniform array).
I could fill in other operations, but pretty much anything else ought to
be getting handled at the NIR level, I think.
total uniforms in shared programs: 13423 -> 13421 (-0.01%)
uniforms in affected programs: 346 -> 344 (-0.58%)
Previously, the ctx->Const.ForceGLSLVersion setting only worked if
the shader lacked a #version directive. Now, the ForceGLSLVersion
setting will override the #version directive too.
This change should be safe since it should be rare to have an app
that has a mix of shader versions and we only wanted to override
the #version for shaders which lacked the #version directive.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The hardware just uses the low 24 lines, saving us an AND to drop the high
bits.
total uniforms in shared programs: 13433 -> 13423 (-0.07%)
uniforms in affected programs: 356 -> 346 (-2.81%)
total instructions in shared programs: 40003 -> 39989 (-0.03%)
instructions in affected programs: 910 -> 896 (-1.54%)
GLSL ES 3.00 spec, 4.3.10 (Linking of Vertex Outputs and Fragment Inputs),
page 45 says the following:
"The type of vertex outputs and fragment input with the same name must match,
otherwise the link command will fail. The precision does not need to match.
Only those fragment inputs statically used (i.e. read) in the fragment shader
must be declared as outputs in the vertex shader; declaring superfluous vertex
shader outputs is permissible."
[...]
"The term static use means that after preprocessing the shader includes at
least one statement that accesses the input or output, even if that statement
is never actually executed."
And it includes a table with all the possibilities.
Similar table or content is present in other GLSL specs: GLSL 4.40, GLSL 1.50,
etc but for more stages (vertex and geometry shaders, etc).
This patch detects that case and returns a link error. It fixes the following
dEQP test:
dEQP-GLES3.functional.shaders.linkage.varying.rules.illegal_usage_1
However, it adds a new regression in piglit because the test hasn't a
vertex shader and it checks the link status.
bin/glslparsertest \
tests/spec/glsl-1.50/compiler/gs-also-uses-smooth-flat-noperspective.geom pass \
1.50 --check-link
This piglit test is wrong according to the spec wording above, so if this patch
is merged it should be updated.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Correct error with commit 151fb1e where assert was renamed
to unreachable without removing ! from string argument.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This does not (yet) support different coordinate origins, so the tests
still fail due to fbo flipping.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This appears to need the A2XX version of the point list, so select it at
draw time if necessary.
Experimentally, always using the A2XX version causes hangs when PSIZE
isn't actually emitted.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The division is probably a holdover from the days when the fixed point
inline functions generated by headergen were broken.
Also reduce the maximum point size to 4092 (vs 4096), which is what the
blob does.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The SZ2 field contains the layer size of a lower miplevel. It only
contains 4 bits, which limits the maximum layer size it can describe. In
situations where the next miplevel would be too big, the hardware
appears to keep minifying the size until it hits one of that size.
Unfortunately the hardware's ideas about sizes can differ from
freedreno's which can still lead to issues. Minimize those by stopping
to minify as soon as possible.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
These are nir_cf_nodes, not ALU instructions.
Also, use unreachable() to preempt said review feedback.
v2: Do it right (thanks Ilia).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Everything is already in place; we simply have to take the scalar code
generation path. This gives us SIMD8 VS programs, instead of SIMD4x2.
v2: Rebase on the patch that drops brw->gen >= 8.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
I need to use this in brw_vec4.cpp, so it can't be static anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use prog_to_nir where we would normally call glsl_to_nir, handle program
parameter lists, and skip a few things that don't exist.
Using NIR generates much better shader code than Mesa IR, since we get
real optimizations, as opposed to prog_optimize:
total instructions in shared programs: 314007 -> 279892 (-10.86%)
instructions in affected programs: 285173 -> 251058 (-11.96%)
helped: 2001
HURT: 67
GAINED: 4
LOST: 7
v2: Change early return in nir_setup_uniforms to if/else (Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
prog->nir will generate fsub opcodes, but i965 doesn't implement them.
We may as well lower them at the NIR level, since it's trivial to do.
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Shamelessly ripped off from Eric Anholt's tgsi_to_nir pass.
This is not built on SCons, like the rest of NIR.
v2:
- Delete redundant c->s, c->impl, and c->cf_node_list pointers (Ken)
- Use nir_builder directly instead of ptn_compile in more places (Ken)
- Drop 'struct' keyword in front of nir_builder (ken)
- Add a file level Doxygen comment (Ken)
- Use scalar constants instead of splatting (Eric)
- Use nir_builder helpers for constants, moves, and swizzles (Connor)
v3: Minor indentation improvements.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These will be useful for prog->nir and tgsi->nir.
v2: Don't forget to mark nir_swizzle as inline (Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The PMA depth stall must be enabled (optimization turned off) under certain
circumstances on gen8. This was supposedly fixed for Gen9, which means we do not
need to check, or toggle the state. The hardware is supposed to enable the
hardware optimization by default, unlike BDW, so we also don't need to set it at
init. For whatever reason this improves stability on ETQW with the bug mentioned
below.
References: https://bugs.freedesktop.org/show_bug.cgi?id=89039 (doesn't fix)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Recomendation [sic] is to set this field to 1 always. Programming it to default
value of 0, may have -ve impact on performance for MSAA WLs.
Another don't suck bit which needs to get set.
The patch wasn't as well tested as I would have liked, primarily I don't have
perf numbers for it, but it's getting to a point where it is in danger of being
lost.
v2: v1 was a mix of two patches. Since 0x7004 is masked, we only need to set it
once at initialization and make sure the pma workaround doesn't set the mask bit
(which it doesn't).
Move LRI to init gpu state (Ken)
Add a comment.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These functions looked quite complicated, even though what they actually did
was trivial (ever since we dropped swizzled rendering). Also drop lookup of
format block per bytes done for each block, and do it once per scene instead.
This improves everybody's favorite "benchmark" by 3% or so, though
lp_rast_shade_quads_all() which calls this shows up still quite high for a
function which does little more than call the jit function.
(This would most likely be much better handled by the jit function itself,
the strides are passed through anyway already, though for being able to
handle layers it would definitely add some complexity.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When using the texel fetch functions rather than ordinary texturing,
the arguments are all int vecs instead of float vecs, not to mention
the actual function would look completely different. Hence this must
be included in the texture function name (which serves as the key)
otherwise things crash badly when a shader accesses the same texture
and sampler unit with both txf/ld and ordinary texturing instructions
with otherwise matching keys.
There are issues with inlining everything, most notably llvm will use much
more memory (and be slower) when compiling. Ideally we'd probably use
functions for shader functions too but texture sampling usually is responsible
for quite some IR (it can easily reach 80% of total IR instructions) so this
seems like a good start.
This still generates a different function for all different combinations just
like before, however it is possible llvm is missing some optimization
opportunities - it is believed though such opportunities should be somewhat
rare, but at least for now it can still be switched off (at compile time only).
It should probably make compiled code also smaller because the same function
should be used for different variants in the same module (so for the
opaque/partial or linear/elts variants).
No piglit change (though it does indeed speed up unrealistic tests like
fp-indirections2 by a factor of 30 or so).
Has a small negative performance impact in openarena - I suspect this could
be fixed by running some IPO passes (despite the private linkage, llvm right
now does NO optimization at all wrt anything going past the call, even if
there's just one caller - so things like values stored before the call and then
always written by the function etc. will not be optimized away, nor will dead
arguments (which we mostly shouldn't have) be eliminated, always constant
arguments promoted etc.).
v2: use proper return values instead of pointer function arguments.
llvm supports aggregate return values, which do wonders here eliminating
unnecessary stack variables - everything in fact will be returned in registers
even without any IPO optimizations. It makes the code simpler too.
With this I could not measure a peformance impact in openarena any longer
(though since there's still no constant value propagation etc. into the tex
functions this does not mean it couldn't have a negative impact elsewhere).
v3: fix some minor issues suggested by Jose, and do disassembly (and the
profiling) without hacks.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The callbacks used for getting the dynamic texture/sampler state were using
the jit_context from the generated jit function. This works just fine, however
that way it's impossible to generate separate functions for texture sampling,
as will be done in the next commit. Hence, pass this pointer through all
interfaces so it can be passed to a separate function (technically, it would
probably be possible to extract this pointer from the current function instead,
but this feels hacky and would probably require some more hacks if we'd use
real functions instead of inlining all shader functions at some point).
There should be no difference in the generated code for now.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The data in memory is in big endian format and needs to be converted
into CPU byte order. So the patch actually reversed what needs to be done.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The CUBE_ARRAY case uses r[4]. Make sure that the stack variable is
there.
Noticed by Coverity.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Does not appear to be used in tree. Coverity spotted some errors in the
bitmask stuff, but the whole thing appears to be unused.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Our fragment program backend implements support for TXP directly, and
there's no NIR lowering pass to remove the projection. When we switch
fragment program support over to NIR, we need to support it somehow.
It's easy enough to support directly.
v2: Split out offset/tex_offset rename (requested by Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
fs_visitor::nir_emit_texture() created an fs_reg variable called offset,
which shadowed the offset() helper function in brw_ir_fs.h.
Rename the variable to tex_offset so we can still call offset().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Don't be lazy. Constify the as_foo functions and use those instead
of ugly casts. Suggested by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Now that they're all implemented using macros, this is trivial.
v2: Remove redundant parenthesis. Suggested by Curro.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The downcast functions for non-leaf classes were previously implemented
"by hand." Now they are implemented using macros based on the is_foo
functions added in the previous patch.
v2: Remove redundant parenthesis. Suggested by Curro (on the next
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These functions deteremine when an IR node is one of the non-leaf
classes.
v2: Adjust indentation to line up. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
These are due how we implemented the atomic tests, not the atomic
implementation itself. It's also difficult to refactor the code to
avoid the warnings due to the use of macros -- the code would be quite
hairy.
Reviewed-by: Brian Paul <brianp@vmware.com>
Somehow, merely including any of the *intrin.h headers causes dozens of
this warnings (when compiling pretty much every source file). MSVC does
not always complain the same -- so it's possible we're doing something
weird --, but silence these warnings in the meanwhile.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC's implementation of signbit(x) uses sizeof(x) with expressions to
dispatch to an internal function based on the argument's type (float,
double, etc), but that raises a flag with MSVC's own static analyzer,
and because this is an inline function in a header it causes substantial
warning spam.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC 2013 declares these functions, both for C and C++ source files.
This was caught with MSVC in analyze mode.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't seem much interest on osmesa on Windows, particularly classic osmesa.
If there is indeed interest in osmesa on Windows, we should instead
integrate src/gallium/targets/osmesa into SCons.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Review from Laura Ekstrand
- get rid of a change that should not have happened in this patch
- improve the error messages
- fix alignments
- fix a capitalization in a function name in an error message
v3: Review from Laura Ekstrand
- move the test for the validity of the renderbuffer to less generic
functions
- get rid of some changes that accidentally landed in the wrong commit
- revert some alignment fixes
v3: Review from Laura Ekstrand
- check that the lookup returns a valid renderbuffer
- cosmetic changes to some error messages
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2:
- improve an error message
v3:
- move a test to less generic functions
- fix an alignment
v4:
- take the caller as a parameter instead of bool dsa
- check that the lookup returns a valid renderbuffer
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Because of the current way the code is architectured, there is no
functional difference between the DSA and the non-DSA path.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
These entry points will be fleshed out when the GL_ARB_query_buffer_object
extension gets implemented. In the meantime, return GL_INVALID_OPERATION as
suggested by Ian.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: Review from Laura Ekstrand
- use the transform feedback object lookup wrapper
v3:
- use the new name of _mesa_lookup_transform_feedback_object_err
v4: Review from Laura Ekstrand
- fix some alignement problems
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: Review from Laura Ekstrand
- use the transform feedback object lookup wrapper
v3:
- use the new name of _mesa_lookup_transform_feedback_object_err
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2: review from Laura Ekstrand
- use the refactored code to lookup the objects
- improve some error messages
- factor out the gl method name computation
- better handle the spec differences between the DSA and non-DSA cases
- quote the spec a little more
v3: review from Laura Ekstrand
- use the new name of _mesa_lookup_bufferobj_err
- swap the comments around the offset and size checks
v4: review from Laura Ekstrand
- add more spec quotes
- properly fix the comments around the offset and size checks
v5: review from Laura Ekstrand
- add quotes on the spec citations
- revert some changes in the printf format
v6: review from Laura Ekstrand
- remove a redondant "gl" in a method name
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
v2: Review from Laura Ekstrand
- give more helpful error messages
- factor the lookup code for the xfb and objBuf
- replace some already-existing tabs with spaces
- add comments to explain the cases where xfb == 0 or buffer == 0
- fix the condition for binding the transform buffer or not
v3: Review from Laura Ekstrand
- rename _mesa_lookup_bufferobj_err to
_mesa_lookup_transform_feedback_bufferobj_err and make it static to avoid a
future conflict
- make _mesa_lookup_transform_feedback_object_err static
v4: Review from Laura Ekstrand
- add the pdf page number when quoting the spec
- rename some of the symbols to follow the public/private conventions
v5: Review from Laura Ekstrand
- properly rename some of the symbols to follow the public/private conventions
- fix some alignments
- add quotes around a spec citation
- add back a newline I accidentally deleted
- add spaces around the ternary operator usages
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
v2: Review from Laura Ekstrand
- generate the name of the gl method once
- shorten some lines to stay in the 78 chars limit
v3: Review from Fredrik Höglund <fredrik@kde.org>
- rename gl_mthd_name to func
- set EverBound in _mesa_create_transform_feedbacks in the dsa case
v4:
- rename _mesa_create_transform_feedbacks to create_transform_feedbacks and
make it static
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Maybe this should be the job of the dispatch layer.
v2:
- add the section name and pdf page number of the quote (Laura)
- OpenGL 3.0 core does not exist, get rid of "core"
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
We were building libglsl_util.la without our visibility flags and
leaking hash_table_* symbols.
Signed-off-by: Kristian Høgsberg <kristian.h.kristensen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Transform this into b2f(and(a, b)).
total instructions in shared programs: 6190291 -> 6189225 (-0.02%)
instructions in affected programs: 267247 -> 266181 (-0.40%)
helped: 866
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make use of the builtin ffs macros and split out ffsll
to a seperate block. Needed for at least OpenBSD which
does not have ffsll in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
No longer used as of commit 48c7461d5a0(st/gbm: remove state-tracker)
v2: Add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
in different fragment shaders. This also applies to a case when gl_FragCoord
is redeclared with no layout qualifiers in one fragment shader and not
declared but used in other fragment shader.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Khronos Bug#12957
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
16 / cpp happens to be the same as utile_w on the only raster format
supported (4 bytes per pixel), but simulator/hw source code generally
talks in terms of utiles.
I'd like to compile as much of the device-specific code as possible
when building for simulator, and using if (using_simulator) instead of
ifdefs helps.
This reverts commit 20346808cf.
The conversion is actually done since these are the *B macro variants
and no vtx format is supplied, which makes them go through the translate
module.
This restores the following piglit tests to passing:
draw-vertices user
gl-2.0-vertexattribpointer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
glXGetProcAddress("glFoo") ends up in stub_add_dynamic() to
create dynamic stubs for dynamic functions. stub_add_dynamic()
doesn't store the caller provided name string "Foo" in a mesa
private copy, but just stores a pointer to the "glFoo" string
passed to glXGetProcAddress - a pointer into arbitrary memory
outside mesa's control.
If the caller passes some dynamically allocated/changing
memory buffer to glXGetProcAddress(), or the caller gets unmapped
from memory, e.g., some dynamically loaded application
plugin which uses OpenGL, this ends badly - with a dangling
pointer.
strdup() the name string provided by the client to avoid
this problem.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The storage size for local kernel args can be queried before the
arguments are set by using the CL_KERNEL_LOCAL_MEM_SIZE param
of clGetKernelWorkGroupInfo().
The spec says that if local kernel arguments have not been specified,
then we should assume their size is 0.
v2:
- Implement using c++11 member initialization.
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
The pipe's get_vendor method returns something more akin to a driver
vendor string in most cases, instead of the actual device vendor. Use
get_device_vendor instead, which was introduced specifically for this
purpose.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The only hackish ones are llvmpipe and softpipe, which currently return
the same string as for get_vendor(), while ideally they should return
the CPU vendor.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This will be needed by Clover to return the correct information
to CL_DEVICE_VENDOR info queries.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
These will be especially useful when we start keeping track of
liveness information for each subregister.
Reviewed-by: Matt Turner <mattst88@gmail.com>
And set it in the MOV instructions that copy the temporary to the
original destination if the generator instruction had it set.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix typo and punctuation in a comment, break long line and add space
before curly bracket.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
try_copy_propagate() was checking the bit of the saturate mask for the
arg-th component of the source to decide whether the whole source
should be saturated (WTF?). We need to swizzle the original saturate
mask and check that for all enabled channels the saturate flag is
either set or unset, as we cannot saturate a subset of destination
components only.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This reverts commit 0dfec59a27. The
change prevented propagation of copies with the saturate flag set,
making the whole saturate mask tracking completely useless. A proper
fix follows.
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This simplifies the src_reg/dst_reg conversion constructors using the
swizzle utils introduced in a previous patch. It also makes them more
useful by changing their semantics slightly: dst_reg(src_reg) used to
set the writemask to XYZW if the src_reg swizzle was anything other
than XXXX, which was almost certainly not what the caller intended if
the swizzle was non-trivial. After this patch the same components
that are present in the swizzle will be enabled in the resulting
writemask.
src_reg(dst_reg) used to set the first components of the swizzle to
the enabled components of the writemask and then replicate the last
enabled component to fill the swizzle, which, in cases where the
writemask didn't have exactly the first n components set, would in
general not be compatible with the original dst_reg. E.g.:
| ADD(tmp, src_reg(tmp), src_reg(1));
would *not* do what one would expect (add one to each of the enabled
components of tmp) if tmp didn't have a writemask of the described
form (e.g. YZ, YW, XZW would all fail). This pattern actually occurs
in many different places in the VEC4 back-end, it's a wonder that it
hasn't caused piglit failures until now. After this patch
src_reg(dst_reg) will construct a swizzle with each enabled component
at its natural position (e.g. Y at the second position, Z at the
third, and so on). The resulting swizzle will behave like the
identity when used in any instruction with the original writemask.
I've manually verified that *none* of the callers of both conversion
constructors were relying on the previous broken semantics. There are
no piglit regressions on any generation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It could be objected that swizzle_for_size() is "faster" than
brw_swizzle_for_size(). It's not measurably better in any reasonable
CPU-bound benchmark on VLV according to the Finnish benchmarking
system (including the SynMark2 DrvShComp shader compilation
benchmark).
Reviewed-by: Matt Turner <mattst88@gmail.com>
This seemed to be trying to deduce the number of uniform vector
components from the parameter swizzle, but the algorithm would always
give 4 as result. Instead grab the correct number of components from
the GLSL type.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This defines helper functions implementing some common swizzle
transformations that are usually open-coded in the compiler back-end,
causing a lot of clutter. Some optimization passes will become almost
trivial implemented in terms of these functions (e.g.
vec4_visitor::opt_reduce_swizzle()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
FS instructions with NIR on i965:
total instructions in shared programs: 2663561 -> 2619051 (-1.67%)
instructions in affected programs: 1612965 -> 1568455 (-2.76%)
helped: 5455
HURT: 12
FS instructions with NIR on g4x:
total instructions in shared programs: 2352633 -> 2307908 (-1.90%)
instructions in affected programs: 1441842 -> 1397117 (-3.10%)
helped: 5463
HURT: 11
FS instructions with NIR on ilk:
total instructions in shared programs: 3997305 -> 3934278 (-1.58%)
instructions in affected programs: 2189409 -> 2126382 (-2.88%)
helped: 8969
HURT: 22
FS instructions with NIR on hsw (snb and ivb were similar):
total instructions in shared programs: 4109389 -> 4109242 (-0.00%)
instructions in affected programs: 109869 -> 109722 (-0.13%)
helped: 339
HURT: 190
No SIMD16 programs were gained or lost on any platform
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Fix the spelling of analyze and re-arrange code for better readability
as per Connor's comments.
v3: Make the naming of things more consistent and add a pile of comments
v4: Stop trying to avoid vectors
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Most cases seem harmless, though that might not always be the case. Maybe
one day we can get gcc to complain about these and fix them throughout
the code, but until then let's silence them.
Reviewed-by: Brian Paul <brianp@vmware.com>
It's a bit hackish couldn't find another solution. See code comment
for details. The warning is useful, so universally disabling doesn't
sound a good idea.
Fixes
warning C4005: 'xxx' : macro redefinition
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids MSVC the warning
warning C4013: 'isatty' undefined; assuming extern returning int
with certain versions of flex.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add win flex-bison link to docs/install.html.
This prevents the MSVC from
warning C4090: 'function' : different 'const' qualifiers
when compiling flex generated lexers.
Reviewed-by: Brian Paul <brianp@vmware.com>
MSVC defaults to no exceptions unless /EH option is passed (which we don't), while
MSVC's STL defaults to use exceptions unless _HAS_EXCEPTIONS=0 is defined,
which we didn't.
This fixes
warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Reviewed-by: Brian Paul <brianp@vmware.com>
This addresses
...\glsl_parser.cpp(...) : warning C4065: switch statement contains 'default' but no 'case' labels
This is on code generated by bison, which we have little control.
It seems useful to have this warning otherwise enabled.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that GLboolean is an alias for unsigned char, which lacks the
implicit true/false semantics that C++/C99 bool have.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Change gl_shader::IsES and gl_shader_program::IsES to be bool as
recommended by Ian Romanick.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should have been part of 429a4355259(galahad: remove driver). Seems like
I've erroneously committed the trimmed patch.
Reported-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Both seems to be excessively long, namely:
ClientAPIString can get up-to 47 based on current code, while the name
of the driver can dictate the length of the VersionString, currently it
is around 11. Let's pad each to 100, rather than the current 1000.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There's 2 reasons why we'd want to use the global context:
1) There still seems to be one memory "leak" left when using multiple llvm
contexts (it is not a true leak as the memory disappears into some still
addressable pool but nevertheless the memory consumption grows). See
http://cgit.freedesktop.org/~jrfonseca/llvm-jitstress/
2) These contexts get kinda big - even when disposing modules etc. after
compiling a shader the LLVMContext can easily be over 100kB. So when there's
lots of llvm contexts arounds it adds up.
The downside is that at least right now this is absolutely not thread safe,
so this only works safely in environments where multiple pipe contexts are not
used concurrently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes a piglit regression
(shaders/glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined) with
my series for GVN.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This is currently not a problem because the vec4 visitor happens to
mask out unused components from the destination, but it might become
an issue when we start using atomics without writeback message. In
any case it seems sensible to set it again here because the
consequences of setting the wrong writemask (random graphics memory
corruption) are difficult to debug and can easily go unnoticed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
And calculate the message response size based on the number of
components rather than the other way around. This simplifies their
interface somewhat and allows the caller to request a writeback
message with more than one vector component in SIMD4x2 mode.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This was telling the sampler to do texture fetches for *all* channels
in the non-constant surface index case, what could have reduced
throughput unnecessarily when some of the channels were disabled by
control flow.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is going to be useful because the Gen7+ uniform and varying pull
constant, texturing, typed and untyped surface read, write, and atomic
generation code on the vec4 and fs back-end all require the same logic
to handle conditionally indirect surface indices. In pseudocode:
| if (surface.file == BRW_IMMEDIATE_VALUE) {
| inst = brw_SEND(p, dst, payload);
| set_descriptor_control_bits(inst, surface, ...);
| } else {
| inst = brw_OR(p, addr, surface, 0);
| set_descriptor_control_bits(inst, ...);
| inst = brw_SEND(p, dst, payload);
| set_indirect_send_descriptor(inst, addr);
| }
This patch abstracts out this frequently recurring pattern so we can
now write:
| inst = brw_send_indirect_message(p, sfid, dst, payload, surface)
| set_descriptor_control_bits(inst, ...);
without worrying about handling the immediate and indirect surface
index cases explicitly.
v2: Rebase. Improve documentatation and commit message. (Topi)
Preserve UW destination type cargo-cult. (Topi, Ken, Matt)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Both do_vs_prog and do_gs_prog initialize brw_stage_prog_data::nr_params to
the number of uniform *vectors* required by the shader rather than the number
of uniform components, contradicting the comment. This is inconsistent with
what the state upload code and scalar path expect but it happens to work until
Gen8 because vec4_visitor interprets it as a number of vectors on construction
and later on overwrites its original value with the number of uniform
components referenced by the shader.
Also there's no need to add the number of samplers, they're not actually
passed in as uniforms.
Fixes a memory corruption issue on BDW with SIMD8 VS.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Several steppings of Skylake fail when using SIMD16 with 3-source
instructions (such as MAD).
This implements WaDisableSIMD16On3SrcInstr and fixes ~190 Piglit
tests.
Based on a patch by Neil Roberts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The places that were checking whether 3-source instructions are
supported have now been combined into a small helper function. This
will be used in the next patch to add an additonal restriction.
Based on a patch by Kenneth Graunke.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brwContextInit now queries the GPU revision number via a new parameter
for DRM_I915_GETPARAM. This new parameter requires a kernel patch and
a patch to libdrm. If the kernel doesn't support it then it will
continue but set the revision number to -1. The intention is to use
this to implement workarounds that are only needed on certain
steppings of the GPU.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Generate GL_INVALID_OPERATION and return NULL when the buffer object
hasn't been created. All callers expect this.
v2: Use a more concise error message.
Cc: Laura Ekstrand <laura@jlekstrand.net>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
This add primitive restart support to the prim conversion.
This involves changing the API for the translate functions
as we need to pass the prim restart index and the original
number of indices into the translate functions.
primitive restart is support for quads, quad strips
and polygons.
This deal with the case where the actual number of output
primitives is less than the initially calculated number,
by filling the rest of the output primitives with the restart
index, the other option is to reduce the output prim number,
but that will make the generator code a bit messier.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should improve the performance of any shaders using the KIL
instruction. I'm a bit surprised we missed this.
Unfortunately, I have not been able to measure any performance
improvements from this patch. It does make ARB_fragment_program
behave similarly to GLSL code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So it turns out that this doesn't actually fix any bugs or add any features,
stictly speaking. However, it does avoid a lot of kludginess. Previously, if
you called
glCopyTextureSubImage3D(texcube, 0, 0, 0, zoffset = 3, ...
it would grab the texture image object for face = 0 in teximage.c instead of
the desired face = 3. But Line 274 of brw_blorp_blit.cpp would correct for
this by updating the slice to 3.
This commit does the correct thing before calling any drivers,
which should make the functionality much more robust and uniform across all
drivers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We use the idiom
ir_foo *x = y->as_foo();
if (x == NULL)
return;
all over the place. GCC generates some quite lovely code for this.
One such example:
340a5b: 83 7d 18 04 cmpl $0x4,0x18(%rbp)
340a5f: 0f 85 06 04 00 00 jne 340e6b
340a65: 48 85 ed test %rbp,%rbp
340a68: 0f 84 fd 03 00 00 je 340e6b
This case used as_expression() (ir_type_expression is 4). Note that it
checks the ir_type, then checks that the pointer isn't NULL. There is
some disconnect in GCC around the condition in the as_foo functions.
return ir_type == ir_type_##TYPE ? (ir_##TYPE *) this : NULL; \
It believes "this" could be NULL, so it emits check outside the function
just for fun.
This patch uses assume() to tell GCC that it need not bother with extra
NULL checking of the pointer returned by the as_foo functions.
text data bss dec hex filename
4836430 158688 26248 5021366 4c9eb6 i965_dri-before.so
4836173 158688 26248 5021109 4c9db5 i965_dri-after.so
v2: Replace 'if (this == NULL) unreachable("this cannot be NULL")' with
assume(this != NULL). Suggested by Ilia Mirkin.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we put all the uniforms into one big array. The problem with
this approach is that, as soon as there was one indirect array acces, the
backend would decide that the entire large array should be pull constants.
This commit splits the array in half: first direct-only uniforms and then
potentially-indirect uniforms. This may not be optimal, but it does let
the backend promote things to push constants.
Shader-db results on HSW:
total instructions in shared programs: 4114840 -> 4112172 (-0.06%)
instructions in affected programs: 43316 -> 40648 (-6.16%)
helped: 116
HURT: 0
v2: Set param_size[num_direct_uniforms] only if we have indirect uniforms.
This caused a bug that, strangely enough, only showed up on Broadwell
vertex shaders.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2: Delete the set of indirectly accessed variables when we're done with it
v3: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we just assigned variable locations in nir_lower_io. Now, we
force the user to assign variable locations for us. This gives the backend
a bit more control over where variables are placed.
v2: Rename from _packed to _scalar
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We never did a single hash table lookup in the entire NIR code base that I
found so there was no real benifit to doing it that way. I suppose that
for linking, we'll probably want to be able to lookup by name but we can
leave building that hash table to the linker. In the mean time this was
causing problems with GLSL IR -> NIR because GLSL IR doesn't guarantee us
unique names of uniforms, etc. This was causing massive rendering isues in
the unreal4 Sun Temple demo.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Different errors for type mismatches, size mismatches and matrix/
non-matrix mismatches. Use a common format of "uniformName"@location
in the messags.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
On platforms that do not natively generate 0u and ~0u for Boolean
results, b2f expressions that look like
f = b2f(expr cmp 0)
will generate better code by pretending the expression is
f = ir_triop_sel(0.0, 1.0, expr cmp 0)
This is because the last instruction of "expr" can generate the
condition code for the "cmp 0". This avoids having to do the "-(b & 1)"
trick to generate 0u or ~0u for the Boolean result. This means code like
mov(16) g16<1>F 1F
mul.ge.f0(16) null g6<8,8,1>F g14<8,8,1>F
(+f0) sel(16) m6<1>F g16<8,8,1>F 0F
will be generated instead of
mul(16) g2<1>F g12<8,8,1>F g4<8,8,1>F
cmp.ge.f0(16) g2<1>D g4<8,8,1>F 0F
and(16) g4<1>D g2<8,8,1>D 1D
and(16) m6<1>D -g4<8,8,1>D 0x3f800000UD
v2: When the comparison is either == 0.0 or != 0.0 use the knowledge
that the true (or false) case already results in zero would allow better
code generation by possibly avoiding a load-immediate instruction.
v3: Apply the optimization even when neither comparitor is zero.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs: 3551002 -> 3550829 (-0.00%)
instructions in affected programs: 33269 -> 33096 (-0.52%)
helped: 121
Iron Lake (0x0046):
total instructions in shared programs: 4993327 -> 4993146 (-0.00%)
instructions in affected programs: 34199 -> 34018 (-0.53%)
helped: 129
No change on other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
The SSE 4.1 ROUND instructions let us implement roundeven directly.
Otherwise we assume that the rounding mode has not been modified (as we
do in the rest of Mesa) and use rint().
glibc uses the ROUND instruction in rint() after a cpuid check. This
patch just lets us inline it directly when we're already building for
SSE 4.1.
Reviewed-by: Carl Worth <cworth@cworth.org>
Eric's initial patch adding constant expression evaluation for
ir_unop_round_even used nearbyint. The open-coded _mesa_round_to_even
implementation came about without much explanation after a reviewer
asked whether nearbyint depended on the application not modifying the
rounding mode. Of course (as Eric commented) we rely on the application
not changing the rounding mode from its default (round-to-nearest) in
many other places, including the IROUND function used by
_mesa_round_to_even!
Worse, IROUND() is implemented using the trunc(x + 0.5) trick which
fails for x = nextafterf(0.5, 0.0).
Still worse, _mesa_round_to_even unexpectedly returns an int. I suspect
that could cause problems when rounding large integral values not
representable as an int in ir_constant_expression.cpp's
ir_unop_round_even evaluation. Its use of _mesa_round_to_even is clearly
broken for doubles (as noted during review).
The constant expression evaluation code for the packing built-in
functions also mistakenly assumed that _mesa_round_to_even returned a
float, as can be seen by the cast through a signed integer type to an
unsigned (since negative float -> unsigned conversions are undefined).
rint() and nearbyint() implement the round-half-to-even behavior we want
when the rounding mode is set to the default round-to-nearest. The only
difference between them is that nearbyint() raises the inexact
exception.
This patch implements _mesa_roundeven{f,}, a function similar to the
roundeven function added by a yet unimplemented technical specification
(ISO/IEC TS 18661-1:2014), with a small difference in behavior -- we
don't bother raising the inexact exception, which I don't think we care
about anyway.
At least recent Intel CPUs can quickly change a subset of the bits in
the x87 floating-point control register, but the exception mask bits are
not included. rint() does not need to change these bits, but nearbyint()
does (twice: save old, set new, and restore old) in order to raise the
inexact exception, which would incur some penalty.
Reviewed-by: Carl Worth <cworth@cworth.org>
Before, we enabled NIR if you set INTEL_USE_NIR to anything which mean that
INTEL_USE_NIR=false would actually turn on NIR. In preparation for turning
NIR on by default, this commit makes it smarter by allowing the
INTEL_USE_NIR variable to work as either a force-enable or a force-disable.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
We never reset the string on eglTerminate, so it grows
for ever on multiple eglInitialise.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As VARYING_SLOT_MAX can be bigger than 32.
I'll probably stop building swrast with MSVC in the near future, but this
seems a real bug regardless.
Reviewed-by: Brian Paul <brianp@vmware.com>
program/lex.yy.c and program/program_parse.tab.c is already included in
the PROGRAM_FILES variable.
We still need to specify the dependency relationship though.
Reviewed-by: Brian Paul <brianp@vmware.com>
By default gcc ignores the issue, and as result code that mixes
signed/unsigned is so widespread through the code base that it ends up
being little more than noise, potentially obscuring more pertinent
warnings.
Maybe one day we enable the corresponding gcc warnings and cleanup, but
until then, this change disables them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Removing this block of pragmas doesn't seem to increase the number of
warning generated by MSVC. Other than signed/unsigned comparison warnings
there's very few other warnings nowadays.
Acked-by: Matt Turner <mattst88@gmail.com>
Use the new _glapi_new_nop_table() and _glapi_set_nop_handler() to
improve how we handle calling no-op GL functions.
If there's a current context for the calling thread, generate a
GL_INVALID_OPERATION error. This will happen if the app calls an
unimplemented extension function or it calls an illegal function
between glBegin/glEnd.
If there's no current context, print an error to stdout if it's a debug
build.
The dispatch_sanity.cpp file has some previous checks removed since
the _mesa_generic_nop() function no longer exists.
This fixes the piglit gl-1.0-dlist-begin-end and gl-1.0-beginend-coverage
tests on Windows.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
_glapi_new_nop_table() creates a new dispatch table populated with
pointers to no-op functions.
_glapi_set_nop_handler() is used to register a callback function which
will be called from each of the no-op functions.
Now we always generate a separate no-op function for each GL entrypoint.
This allows us to do proper stack clean-up for Windows __stdcall and
lets us report the actual function name in error messages. Before this
change, for non-Windows release builds we used a single no-op function
for all entrypoints.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
One more case we need to handle. One of the src instructions for the
indirect could also end up being ourself.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
radeon_llvm_emit_prepare_cube_coords uses coords[4] in some cases (TXB2 etc.)
Discovered by Coverity. Reported by Ilia Mirkin.
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Check if the compiler supports -Werror=vla before using it.
-Wvla was introduced with GCC 4.3 and is not present in 4.2.
Fixes the build on OpenBSD.
v2: Fix statement order, and quote $save_CFLAGS.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89433
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
Currently, we throttle before the user begins preparing commands for the
next frame when we acquire the draw/read buffers. However, construction
of the command buffer can itself take significant time relative to the
frame time. If we move the throttle from the buffer acquire to the
command submit phase we can allow the user to improve concurrency
between the CPU and GPU (i.e. reduce the amount of time we waste inside
the throttle).
v2: Whitespace + delay throttling until after the next submission for
greater parallelism
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
In order to facilitate the concurrency offered by triple buffering and to
offset the latency induced by swapping via an external process, which
may incur extra rendering itself, only throttle to the previous frame
and not the last. The second issue that mostly affects swap benchmarks,
but also can incur jitter in the throttling, is that the throttle bo is
closer to the next SwapBuffers rather than immediately after the previous
SwapBuffers. Throttling to the previous frame doubles the maximum possible
latency at the benefit of improving throughput and reducing jitter.
v2: Rename "first_post_swapbuffer" batches array to a plain
throttle_batch[] as the pluralisation was contorting the name and not
making it clear as to whether it was the first batch or first_post_swap
batch. Not least of which was that not all throttle points are SwapBuffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to an fbo, even though it may be acting as a winsys
frontbuffer or just generally, we never throttle. However, when rendering
to an fbo, there is no natural frame boundary. Conventionally we use
SwapBuffers and glFinish, but potential callers avoid often glFinish for
being too heavy handed (waiting on all outstanding rendering to complete).
The kernel provides a soft-throttling option for this case that waits for
rendering older than 20ms to be complete (that's a little too lax to be
used for swapbuffers, but is here a useful safety net). The remaining
choice is then either never to throttle, throttle after every draw call,
or at after intermediate user defined point such as glFlush and thus all the
implied flushes. This patch opts for the latter as that is the current
method used for flushing to front buffers.
v2: Defer the throttling from inside the flush to the next
intel_prepare_render() and switch non-fbo frontbuffer throttling over to
use the same lax method. The issuing being that
glFlush()/intel_prepare_read() is just as likely to be called inside a
tight loop and not at "frame" boundaries.
v3: Rename from need_front_throttle to need_flush_throttle to avoid any
ambiguity between front buffer rendering and fbo rendering. (Chad)
v4: Whitespace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Shader-db results on HSW:
total instructions in shared programs: 4174156 -> 4157291 (-0.40%)
instructions in affected programs: 145397 -> 128532 (-11.60%)
helped: 383
HURT: 0
GAINED: 20
LOST: 22
There are two more tests lost than gained. However, comparing this with
GLSL IR vs. NIR results, the overall delta is reduced from 85/44
gained/lost on current master to 71/32 with this commit. Therefore, I
think it's probably a boon since we are getting "closer" to where we were
before.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously we tried to do poor-man's copy propagation as we created the
select instructions. Instead, this commit just moves the instructions from
the blocks inside the if into the block before. Copy propagation will take
care of making sure we don't have any extra mov's in there for us.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The code for emitting INTEL_swap_events swap completion
events needs to translate from 32-Bit sbc on the wire to
64-Bit sbc for the events and handle wraparound accordingly.
It assumed that events would be sent by the server in the
order their corresponding swap requests were emitted from
the client, iow. sbc count should be always increasing. This
was correct for DRI2.
This is not always the case under the DRI3/Present backend,
where the Present extension can execute presents and send out
completion events in a different order than the submission
order of the present requests, due to client code specifying
targetMSC target vblank counts which are not strictly
monotonically increasing. This confused the wraparound
handling. This patch fixes the problem by handling 32-Bit
wraparound in both directions. As long as successive swap
completion events real 64-Bit sbc's don't differ by more
than 2^30, this should be able to do the right thing.
How this is supposed to work:
awire->sbc contains the low 32-Bits of the true 64-Bit sbc
of the current swap event, transmitted over the wire.
glxDraw->lastEventSbc contains the low 32-Bits of the 64-Bit
sbc of the most recently processed swap event.
glxDraw->eventSbcWrap is a 64-Bit offset which tracks the upper
32-Bits of the current sbc. The final 64-Bit output sbc
aevent->sbc is computed from the sum of awire->sbc and
glxDraw->eventSbcWrap.
Under DRI3/Present, swap completion events can be received
slightly out of order due to non-monotic targetMsc specified
by client code, e.g., present request submission:
Submission sbc: 1 2 3
targetMsc: 10 11 9
Reception of completion events:
Completion sbc: 3 1 2
The completion sequence 3, 1, 2 would confuse the old wraparound
handling made for DRI2 as 1 < 3 --> Assumes a 32-Bit wraparound
has happened when it hasn't.
The client can queue multiple present requests, in the case of
Mesa up to n requests for n-buffered rendering, e.g., n = 2-4 in
the current Mesa GLX DRI3/Present implementation. In the case of
direct Pixmap presents via xcb_present_pixmap() the number n is
limited by the amount of memory available.
We reasonably assume that the number of outstanding requests n is
much less than 2 billion due to memory contraints and common sense.
Therefore while the order of received sbc's can be a bit scrambled,
successive 64-Bit sbc's won't deviate by much, a given sbc may be
a few counts lower or higher than the previous received sbc.
Therefore any large difference between the incoming awire->sbc and
the last recorded glxDraw->lastEventSbc will be due to 32-Bit
wraparound and we need to adapt glxDraw->eventSbcWrap accordingly
to adjust the upper 32-Bits of the sbc.
Two cases, correponding to the two if-statements in the patch:
a) Previous sbc event was below the last 2^32 boundary, in the previous
glxDraw->eventSbcWrap epoch, the new sbc event is in the next 2^32
epoch, therefore the low 32-Bit awire->sbc wrapped around to zero,
or close to zero --> awire->sbc is apparently much lower than the
glxDraw->lastEventSbc recorded for the previous epoch
--> We need to increment glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch to be one higher than the previous one.
--> Case a) also handles the old DRI2 behaviour.
b) Previous sbc event was above closest 2^32 boundary, but now a
late event from the previous 2^32 epoch arrives, with a true sbc
that belongs to the previous 2^32 segment, so the awire->sbc of
this late event has a high count close to 2^32, whereas
glxDraw->lastEventSbc is closer to zero --> awire->sbc is much
greater than glXDraw->lastEventSbc.
--> We need to decrement glxDraw->eventSbcWrap by 2^32 to adjust
the current epoch back to the previous lower epoch of this late
completion event.
We assume such a wraparound to a higher (a) epoch or lower (b)
epoch has happened if awire->sbc and glxDraw->lastEventSbc differ
by more than 2^30 counts, as such a difference can only happen
on wraparound, or if somehow 2^30 present requests would be pending
for a given drawable inside the server, which is rather unlikely.
v2: Explain the reason for this patch and the new wraparound handling
much more extensive in commit message, no code change wrt. initial
version.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Both of which are no longer used. Use designated initializer to make
things obvious as people add/remove TGSI_OPCODEs.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
... rather than the local one in inst_info->tgsi_opcode.
This will allow us to simplify struct r600_shader_tgsi_instruction.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
At the very least, unreal4/sun-temple/102.shader_test uses this pattern
for a signed integer result. However, that shader did not hit the
optimization in the first place because it uses !gl_FrontFacing. I
changed the shader to use remove the logical-not and reverse the other
operands. I verified that incorrect code is generated before this
change and correct code is generated after.
Fixes fs-frontfacing-ternary-1-neg-1.shader_test.
No shader-db changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we check for the case that is actually necessary, the asserts
become superfluous.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Espically on platforms that do not natively generate 0u and ~0u for
Boolean results, we generate a lot of sequences where a CMP is
followed by an AND with 1. emit_bool_to_cond_code does this, for
example. On ILK, this results in a sequence like:
add(8) g3<1>F g8<8,8,1>F -g4<0,1,0>F
cmp.l.f0(8) g3<1>D g3<8,8,1>F 0F
and.nz.f0(8) null g3<8,8,1>D 1D
(+f0) iff(8) Jump: 6
The AND.nz is obviously redundant. By propagating the cmod, we can
instead generate
add.l.f0(8) null g8<8,8,1>F -g4<0,1,0>F
(+f0) iff(8) Jump: 6
Existing code already handles the propagation from the CMP to the ADD.
Shader-db results:
GM45 (0x2A42):
total instructions in shared programs: 3550829 -> 3550788 (-0.00%)
instructions in affected programs: 10028 -> 9987 (-0.41%)
helped: 24
Iron Lake (0x0046):
total instructions in shared programs: 4993146 -> 4993105 (-0.00%)
instructions in affected programs: 9675 -> 9634 (-0.42%)
helped: 24
Ivy Bridge (0x0166):
total instructions in shared programs: 6291870 -> 6291794 (-0.00%)
instructions in affected programs: 17914 -> 17838 (-0.42%)
helped: 48
Haswell (0x0426):
total instructions in shared programs: 5779256 -> 5779180 (-0.00%)
instructions in affected programs: 16694 -> 16618 (-0.46%)
helped: 48
Broadwell (0x162E):
total instructions in shared programs: 6823088 -> 6823014 (-0.00%)
instructions in affected programs: 15824 -> 15750 (-0.47%)
helped: 46
No chage on Sandy Bridge or on any platform when NIR is used.
v2: Add unit tests suggested by Matt. Remove spurious writes_flag()
check on scan_inst when scan_inst is known to be BRW_OPCODE_CMP (also
suggested by Matt).
v3: Fix some comments and remove some explicit int() casts in fs_reg
constructors in the unit tests. Both suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
text data bss dec hex filename
9663 0 0 9663 25bf intel_tiled_memcpy.o before
8215 0 0 8215 2017 intel_tiled_memcpy.o after
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v3: Review from Fredrik Hoglund
-Split cosmetic refactor of GetBufferPointerv out into a separate commit
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2:-Remove "_mesa" from in front of static software fallback.
-Split out the refactor from the addition of the DSA entry points.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review from Ian Romanick
- Restore VBO_DEBUG and BOUNDS_CHECK
- Remove _mesa from static software fallback unmap_buffer.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review from Jason Ekstrand
- Split refactor from addition of DSA entry points.
review from Ian Romanick
- Remove "_mesa" from static software fallback map_buffer_range
- Restore VBO_DEBUG and BOUNDS_CHECK
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: review by Jason Ekstrand
- Split refactor of clear buffer sub data from addition of DSA entry
points.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
v2: review by Ian Romanick
- Remove "_mesa" from name of static software fallback buffer_sub_data.
- Remove mappedRange from _mesa_buffer_sub_data.
- Removed some cosmetic changes to a separate commit.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
v2: review from Ian Romanick
- Fix space in ARB_direct_state_access.xml.
- Remove "_mesa" from the name of buffer_data static fallback.
- Restore VBO_DEBUG and BOUNDS_CHECK.
- Fix beginning of comment to start on same line as /*
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
This reverts commit 1ee000a0b6.
Failures with the GLES3 conformance suite and Synmark2 OGLHdrBloom revealed
that this commit was in error.
Extensive testing with Piglit prior to patch review and upstreaming did not
reveal this problem because, in the few Piglit tests that test for cube
completeness, NumLayers = 6. This is because all of the existing tests use
TextureStorage to initialize the texture, which sets NumLayers.
A new Piglit test has been sent to the mailing list that reproduces the bug
related to this patch ("texturing: Testing
glGenerateMipmap(GL_TEXTURE_CUBE_MAP) without glTexStorage2D").
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 0ac4c27275 made it add a header for the send message when
using SIMD4x2 on Skylake because without this it will end up using
SIMD8D. However the patch missed the case when a sampler is being used
to implement constant loads from a buffer surface in a SIMD4x2 vertex
shader.
This fixes 29 Piglit tests, mostly related to the ARL instruction in
vertex programs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Commit bb33a31 introduced optimizations that transform cases of MAD
in to simpler forms but it did not take in to account that src[0]
can not be immediate and did not report progress. Patch switches
src[0] and src[1] if src[0] is immediate and adds progress
reporting. If both sources are immediates, this is taken care of by
the same opt_algebraic pass on later run.
v2: Fix for all cases, use temporary fs_reg (Matt, Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89569
Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Before this actually ran into an infinite loop printing out "invalid"...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Remove the forward declaration and make use of the DEBUG_PRINT macro for
debug builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously PTHREAD_MUTEX_RECURSIVE_NP had been used on linux for
compatibility with old glibc. Since mesa defines __GNU_SOURCE__
on linux PTHREAD_MUTEX_RECURSIVE is also available since at least
1998. So we can unconditionally use the portable version
PTHREAD_MUTEX_RECURSIVE.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88534
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: Don't use the intrinsics, the shader backend can recognize these
patterns and generates optimal code automatically.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
IvyBridge and Haswell PRM say that the JIP should be emitted
with type W but we were using UD. The previous implementation
did not show adverse effects, but IMHO it is safer to follow
the specification thoroughly.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
This requires enabling the optional GL provoking vertex behavior for quads.
+ some cosmetic changes, so that the register is set exactly the same as
on r600.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The fragment shader multiplies the alpha channel with gl_SampleMaskIn.
If blending is enabled, it looks like MSAA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This will be used for line and polygon smoothing.
This is GCN-only even though it's in shared code.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I have to use the BFE instrinsics, because BFE is one of the most complex
instructions that can't be matched easily. BFE has 3 conditional branches
and one of them is quite big.
In the isel DAG, lowered BFE has 27 nodes (including leafs).
Now that piglit is no longer falling back to old compiler for any tests,
we can remove it. Hurray \o/
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Deadlock can occur if we schedule an address register write, yet some
instructions which depend on that address register value also depend on
other unscheduled instructions that depend on a different address
register value. To solve this, before scheduling an address register
write, ensure that all the other dependencies of the instructions which
consume this address register are already scheduled.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add an array_insert() macro to simplify inserting into dynamically sized
arrays, add a comment, and remove unused prototype inherited from the
original freedreno.git/fdre-a3xx test code, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Create a backend_inst::is_commutative() method to replace two static
functions that did the exact same thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is unfortunately sometimes necessary due to rebasing levels when
rendering into them.
16 piglits crash -> pass, when building mesa with debug enabled.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For example if width were 65, the first slice would get 96 while the
second would get 32. However the hardware appears to expect the second
pitch to be 64, based on halving the 96 (and aligning up to 32).
This fixes texelFetch piglit tests on a3xx below a certain size. Going
higher they break again, but most likely due to unrelated reasons.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
We only program in one layer size per texture, so that means that all
levels must share one size. This makes the piglit test
bin/texelFetch fs sampler2DArray
have the same breakage as its non-array version instead of being
completely off, and makes
bin/ext_texture_array-gen-mipmap
start passing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
The ir_unop_any problem was discovered by some later optimization passes
that generate ir_triop_csel. I was also able to reproduce it by
modifying the gl-2.0-vertexattribpointer vertex shader to generate its
result using
color = mix(vec4(0, 1, 0, 0),
vec4(1, 0, 0, 0),
bvec4(any(greaterThan(diff, vec4(tolerance)))));
instead of an if-statement. This also required using #version 130 and
MESA_GLSL_VERSION_OVERRIDE=130.
I have not nominated this for stable releases because I don't think
there's any way to trigger the problem without GLSL 1.30 or
optimizations that don't exist in stable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@intel.com>
Most of the brw_inst_* api returns 64bit values. This fixes disassembly
of sampler messages, etc.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to get warnings from GCC when we mess up the format
strings.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ARB_shading_language_packing is part of GLSL 4.2, not 4.0 as I
mistakenly believed. The following functions are available only with
ARB_shading_language_packing, GLSL 4.2 (not GLSL 4.0), or ES 3.0:
- packSnorm2x16
- unpackSnorm2x16
- packHalf2x16
- unpackHalf2x16
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Creating/recreating the strings in eglQueryString() is extra work and
isn't thread-safe, as exhibited by shader-db's run.c using libepoxy.
Multiple threads in run.c call eglReleaseThread() around the same time.
libepoxy calls eglQueryString() to determine whether eglReleaseThread()
exists, and our EGL implementation passes a pointer to the version
string to libepoxy while simultaneously overwriting the string, leading
to a failure in libepoxy.
Moreover, the EGL spec says (emphasis mine):
"eglQueryString returns a pointer to a *static*, zero-terminated string"
This patch moves some auxiliary functions from eglmisc.c to eglapi.c so
that they may be used to create the extension, API, and version strings
once during eglInitialize(). The auxiliary functions are renamed from
_eglUpdate* to _eglCreate*, and some checks made unnecessary by calling
the functions from eglInitialize() are removed.
Reviewed-by: Chad Versace <chad.versace@intel.com>
This patch adds two types of checks to the gl(Compressed)Tex(Sub)Imgage family
of functions when a pixel buffer object is bound to GL_PIXEL_UNPACK_BUFFER:
- That the buffer is not mapped.
- The total data size is within the boundaries of the buffer size.
It does so by calling auxiliary validations functions from PBO API:
_mesa_validate_pbo_source() for non-compressed texture calls, and
_mesa_validate_pbo_source_compressed() for compressed texture calls.
The first check is defined in Section 6.3.2 'Effects of Mapping Buffers
on Other GL Commands' of the GLES 3.1 spec, page 57:
"Any GL command which attempts to read from, write to, or change the
state of a buffer object may generate an INVALID_OPERATION error if all
or part of the buffer object is mapped. However, only commands which
explicitly describe this error are required to do so. If an error is not
generated, using such commands to perform invalid reads, writes, or
state changes will have undefined results and may result in GL
interruption or termination."
Similar wording exists in GL 4.5 spec, page 76.
In the case of gl(Compressed)Tex(Sub)Image(2,3)D, the specification doesn't force
implemtations to throw an error. However since Mesa don't currently implement
checks to determine when it is safe to read/write from/to a mapped PBO, we
should always return the error if all or parts of it are mapped.
The 2nd check is defined in Section 8.5 'Texture Image Specification' of the
OpenGL 4.5 spec, page 203:
"An INVALID_OPERATION error is generated if a pixel unpack buffer object
is bound and storing texture data would access memory beyond the end of
the pixel unpack buffer."
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.compressedteximage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage2d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedteximage3d_invalid_buffer_target
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d_invalid_buffer_target
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Internal PBO functions such as _mesa_map_validate_pbo_source() and
_mesa_validate_pbo_compressed_teximage() perform validation and buffer mapping
within the same call.
This patch takes out the validation into separate functions to allow reuse
of functionality by other code (i.e, gl(Compressed)Tex(Sub)Image).
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
_mesa_validate_pbo_access() provides a generic way to check that a
requested pixel transfer operation on a PBO falls within the
boundaries of the buffer. It is used in various other places, and
depending on the caller, some arguments are used or not.
In particular, the 'clientMemSize' argument is used only by calls
that are knowledgeable of the total size of the user data involved
in a pixel transfer, such as the case of compressed texture image
calls. Other calls don't provide 'clientMemSize' directly since it
is made implicit from the size and format of the texture, and its
data type. In these cases, a sufficiently big value is passed to
'clientMemSize' (INT_MAX) to avoid an incorrect constrain.
The problem is that _mesa_validate_pbo_access() use uint
pointers to make the calculations, which are 64 bits long in 64
bits platforms, meanwhile the dummy INT_MAX passed in 'clientMemSize'
is just 32 bits. This causes a constrain that is not desired.
This patch fixes that by checking that if 'clientMemSize' is MAX_INT,
then UINTPTR_MAX is assumed instead.
This is an ugly workaround to the fact that _mesa_validate_pbo_access()
intends to be a one function fits all. The clean solution here would
be to break it into different functions that provide the adequate API
for each of the possible code paths and validation needs.
Since there are callers relying on passing INT_MAX to 'clientMemSize',
this patch is necessary to deal with the problem above while a cleaner
implementation of the PBO API is not implemented.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
The implementation of texture <-> pixel-buffer transfers in drivers common layer
includes certain error checks and argument validation that don't belong there,
considering how the Mesa codebase is laid out. These are higher level
validations that, if necessary, should be performed earlier (i.e, in GL API
entry points).
This patch simply removes these error checks from driver code.
For more information, see discussion at
http://lists.freedesktop.org/archives/mesa-dev/2015-February/077417.html.
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
eglcurrent.c: In function '_eglSetTSD':
eglcurrent.c:57:4: warning: passing argument 2 of 'tss_set' discards
'const' qualifier from pointer target type [enabled by default]
tss_set(_egl_TSD, (const void *) t);
^
In file included from ../../../include/c11/threads.h:72:0,
from eglcurrent.c:32:
../../../include/c11/threads_posix.h:357:1: note: expected 'void *'
but argument is of type 'const void *'
tss_set(tss_t key, void *val)
^
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The memory layout of compatible internal formats may differ in bytes per
block, so TexFormat is not a reliable measure of compatibility. For example,
GL_RGB8 and GL_RGB8UI are compatible formats, but GL_RGB8 may be laid out in
memory as B8G8R8X8. If GL_RGB8UI has a 3 byte-per-block memory layout, the
existing compatibility check will fail.
Additionally, the current check allows any two compressed textures which share
block size to be used, whereas the spec gives an explicit table of compatible
formats.
v2: Use a switch instead of array iteration for block class and show the
correct GL error when internal formats are mismatched.
v3: Include spec citations for new compatibility checks, rearrange check
order to ensure that compressed, view-compatible formats return the
correct result, and make style fixes. Original commit message amended
for clarity.
v4: Reformatted spec citations.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously, we stored derefs in a hash table, using the malloc'd pointer
as the key. Then, we walked through the hash table and generated code,
based on the order of the hash table's elements.
Memory addresses returned by malloc are pretty much random, which meant
that the hash was random, and the hash table's elements would be walked
in some random order. This led to successive compiles of the same
shader using different variable names and slightly different orderings
of phi-nodes. Code could not be diff'd, and the final assembly would
sometimes change slightly too.
It turns out the only point of the hash table was to avoid inserting
the same node multiple times for different dereferences. We never
actually searched the hash table! This patch uses an intrusive
linked list instead. Since exec_list uses head and tail sentinels,
checking prev or next against NULL will tell us whether the node is
already in the list.
Pair programming with Jason Ekstrand.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
__next and __prev are pointers to the structure containing the exec_node
link, not the embedded exec_node. NULL checks would fail unless the
embedded exec_node happened to be at offset 0 in the parent struct.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Use "(__node)->__field.next != NULL" to check for the end of the list
instead of the "&__next->__field != NULL". The former is far more
obviously correct as it matches what the non-safe versions do. The
original code tried to avoid any use of __next as the client code may
delete it during its execution. However, since the looping condition is
checked after the iteration clause but before the client code is
executed, we know that __node is valid during the looping condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(Jason noted that this is not a good long term solution, and we should
instead improve nir_lower_io so that this extra set of MOVs is
unnecessary. I tend to agree, but decided we could do that as a
follow-up improvement.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No functional change. In preparation for supporting vertex shaders,
this adds a switch statement on shader stage (since vertex attributes
and fragment shader varyings will need different handling). It also
renames "varying" to "input", to be more general.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Ian and I added these around the time Connor was developing NIR. Now
that both exist, we should make them work together!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We can't safely call nir_optimize() with register present, since several
passes called in the loop can't handle registers, and will fail asserts.
Notably, nir_lower_vec_alus() and nir_opt_algebraic() really don't want
registers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Array variable copy splitting generates a bunch of stuff we want to
clean up before proceeding.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR backend hardcodes brw_wm_prog_key at the moment, which won't
work when we support scalar VS. We could use get_tex(), but it's a
static method. I was going to promote it to fs_visitor, but then
realized that both parameters (stage and key) are already members.
It then occured to me that we could just set up a pointer in the
constructor, and skip having a function altogether.
This patch also converts all existing users to use key_tex.
v2: Make key_tex a "const brw_sampler_prog_key_data *" instead of
non-const; word-wrap some lines. (Review comments from Topi.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_BLENDAPI boils down to __cdecl on Windows, but __cdecl is the default
calling convention so this serves no purpose.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By passing --force autoreconf will update all the aux files, which would
otherwise be ignored if one updates autoconf/automake.
Quote the ORIGDIR variable to prevent fall-outs, when its name contains
space.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Implicitly required for a while, although commit 9385c592c6 (mapi:
remove u_thread.h) was the one that put the final nail on the
coffin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Left over from commit 18db13f5865(mapi: THREADS was always defined,
remove it)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This has been an implicit rule for building mesa for a long time. Let's
make it official and just bail out at configure time. This way we can
cleaning up some of our glx code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Convert the code to use the C11 threads implementation, and nuke the
Windows non-pthreads code-path. The c11/threads_win32.h abstraction
should be better than the current code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul review suggestion: there's more macro use here than necessary.
Removed and redefine some #define preprocessing directives.
Removed the directive input parameter 'T' .
No functional changes.
Signed-off-by: Marius Predut <marius.predut@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were already using strdup() in various places in Mesa. Get rid
of the _mesa_strdup() wrapper. All the callers pass a non-NULL
argument so the NULL check isn't needed either.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed
to do an out of bounds access into an uniform array to make sure that we
handle that situation gracefully inside the driver, however, as Ken describes
in bug 79202, Valgrind reports that this is leading to an out-of-bounds access
in fs_visitor::demote_pull_constants().
Before accessing the pull_constant_loc array we should make sure that
the uniform we are trying to access is valid.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_imm_ud(0xffff) should have been converted to fs_reg(0xffffu) to
make sure the uint32_t fs_reg constructor was matched.
commit 49a938a265
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Fri Feb 20 12:12:25 2015 -0800
i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We now skip allocating a hiz miptree for gen8. Instead, we calculate
the required hiz buffer parameters and allocate a bo directly.
v2:
* Update hz_height calculation as suggested by Topi
v3:
* Bail if we failed to create the bo (Ben)
v4:
* CEILING => DIV_ROUND_UP
* Make sure mt->logical_depth0 being 0 would not cause trouble
* Fail if Y tiling is not returned
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67564
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We now skip allocating a hiz miptree for gen7. Instead, we calculate
the required hiz buffer parameters and allocate a bo directly.
v2:
* Update hz_height calculation as suggested by Topi
v3:
* Bail if we failed to create the bo (Ben)
v4:
* CEILING => DIV_ROUND_UP
* Make sure mt->logical_depth0 being 0 would not cause trouble
* Fail if Y tiling is not returned
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=67564
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We are still allocating a miptree for hiz, but we only use fields from
intel_miptree_aux_buffer. This will allow us to switch over to not
allocating a miptree.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We are still allocating a miptree for hiz, but we only use fields from
intel_miptree_aux_buffer. This will allow us to switch over to not
allocating a miptree.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Today we allocate a miptree's for the hiz buffer. We needed this in
the past because we would point the hardware at offsets of the hiz
buffer. Since the hiz format is not documented, this is not a good
idea.
Since moving to support layered rendering on Gen7+, we no longer point
at an offset into the buffer on Gen7+.
Therefore, to support hiz on Gen7+, we don't need a full miptree
structure allocated.
This patch starts to create a new auxiliary buffer structure
(intel_miptree_aux_buffer) that can be a more simplistic miptree
side-band buffer associated with a miptree. (For example, to serve the
needs of the hiz buffer.)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The maximum value of a Gallium HUD's panel is automatically adjusted
when the current value is greater than the max. If we set the
pipe_query_driver_info::max_value to UINT64_MAX, the maximum value is
never adjusted and this results in a flat line instead of a pretty curve
which is correctly scaled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_shader.cpp: In function ‘bool brw_saturate_immediate(brw_reg_type, brw_reg*)’:
brw_shader.cpp:618:31: warning: ‘sat_imm.brw_saturate_immediate(brw_reg_type, brw_reg*)::<anonymous union>::ud’ may be used uninitialized in this function [-Wmaybe-uninitialized]
reg->dw1.ud = sat_imm.ud;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
i915_fragprog.c: In function ‘i915ValidateFragmentProgram’:
i915_fragprog.c:1453:11: warning: variable ‘k’ set but not used [-Wunused-but-set-variable]
int k;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It looks like this has existed since
commit f5a477ab76
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Mon Dec 16 11:54:08 2013 -0800
meta: Refactor shader generation code out of mipmap generation path
Valgrind was complaining on fbo-generatemipmap-formats
v2: Instead, do the allocation after the early return block (v2)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We used to loop over all color attachments, and emit FB writes for each
one, even if the shader didn't write to a corresponding output variable.
Those color attachments would be filled with garbage (undefined values).
Football Manager binds a framebuffer with 4 color attachments, but draws
to it using a shader that only writes to gl_FragData[0..2]. This meant
that color attachment 3 would be filled with garbage, resulting in
rendering artifacts. Now we skip writing to it, fixing rendering.
Writes to gl_FragColor initialize outputs[0..nr_color_regions-1] to
GRFs, while writes to gl_FragData[i] initialize outputs[i].
Thanks to Jason Ekstrand for tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86747
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Previously, we emitted the shader-time epilogue from emit_fb_writes(),
during the middle of looping through color regions (or emit_urb_writes
for the VS). This is duplicated several times and rather awkward.
I need to fix a bug in our FB write handling, and it will be a lot
easier if we move emit_shader_time_end() out of there.
Now, we simply emit FB writes/URB writes, and subsequently have
emit_shader_time_end() insert instructions before the final SEND with
EOT. Not only is this simpler, it's actually a slight improvement:
we now include the MOVs to set up the final FB write payload in our
shader-time measurements.
Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses
send-from-GRF. (In the past, we might have hit trouble where both
attempt to use MRFs for messages; that's not a problem now.)
v2: Rebase on v3 of the previous patch and other shader_time fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> [v1]
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
This makes another part of the INTEL_DEBUG=shader_time code emittable
at arbitrary locations, rather than just at the end of the instruction
stream.
v2: Don't lose smear! Caught by Topi Pohjolainen.
v3: Don't set smear on the destination of the MOV. Thanks Topi!
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)).
The advantage is that we can also insert a shader time write at an
arbitrary location in the instruction stream, rather than being
restricted to emitting at the end.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Lets define R600_MAX_VIEWPORTS instead of using 16 here and there
in the code when looping through viewports and scissors. It is
easier to understand what this number represents.
v2: Missed a case where R600_MAX_VIEWPORTS should have been used.
Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Both the AMD and Intel APIs provide a dataSize parameter, and this
function would merrily ignore it. Neither API specifies what to do when
the buffer isn't big enough. I take the easy route of writing all the
complete bits of data that will fit. With more complete specs, we could
probably do something different.
I noticed this while looking into an unused parameter warning. The
warning was actually useful!
brw_performance_monitor.c: In function 'brw_get_perf_monitor_result':
brw_performance_monitor.c:1261:37: warning: unused parameter 'data_size' [-Wunused-parameter]
GLsizei data_size,
^
v2: Fix checks to include offset in the calculation. Noticed by Jan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
All dd functions take a gl_context as the first parameter. Instead of
removing it, just silence the warning.
brw_performance_monitor.c: In function 'brw_new_perf_monitor':
brw_performance_monitor.c:1354:41: warning: unused parameter 'ctx' [-Wunused-parameter]
brw_new_perf_monitor(struct gl_context *ctx)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
What a useful warning. #ThanksGCC
brw_performance_monitor.c:153:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen5_raw_chaps_counters[] = {
^
brw_performance_monitor.c:185:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen5_oa_snapshot_layout[] =
^
brw_performance_monitor.c:221:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen5_groups[] = {
^
brw_performance_monitor.c:240:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_raw_oa_counters[] = {
^
brw_performance_monitor.c:281:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_oa_snapshot_layout[] =
^
brw_performance_monitor.c:317:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen6_statistics_counters[] = {
^
brw_performance_monitor.c:332:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen6_statistics_register_addresses[] = {
^
brw_performance_monitor.c:346:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen6_groups[] = {
^
brw_performance_monitor.c:356:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_raw_oa_counters[] = {
^
brw_performance_monitor.c:402:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_oa_snapshot_layout[] =
^
brw_performance_monitor.c:470:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_counter gen7_statistics_counters[] = {
^
brw_performance_monitor.c:493:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int gen7_statistics_register_addresses[] = {
^
brw_performance_monitor.c:515:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static struct gl_perf_monitor_group gen7_groups[] = {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I don't this opt_cmod_propagation_local ever used the fs_visitor.
brw_fs_cmod_propagation.cpp:52:40: warning: unused parameter 'v' [-Wunused-parameter]
opt_cmod_propagation_local(fs_visitor *v, bblock_t *block)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Review by Martin Peres
- Get rid of difficult-to-follow code copied and pasted from
the original TexBufferRange
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Creates a shared function to ensure that texture buffer target is
GL_TEXTURE_BUFFER. Helps to clean up the Tex[ture]Buffer[Range] functions.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Creates a shared function that TexBufferRange and TextureBufferRange can use
to check the buffer range. This cleans up TexBufferRange considerably.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Adds a useful comment and some whitespace. Fixes an error message.
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Changes how the caller is identified in error messages, moves a check for
ARB_texture_buffer_object from the entry points to the shared code in
_mesa_texture_buffer_range, and removes an unused argument (GLenum target).
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Review from Anuj Phogat
- Split rebase of Tex[ture]Buffer[Range]
- Closing curly brace on the same line as else
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This function is exposed to mesa driver internals so that texture buffer
objects and array objects can use it.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
ARB_direct_state_access functions that deal with texture cube
maps need to make sure that texture images are not NULL before operating on
them. In the following cases, the error check functions already throw an
error if texImage == NULL, so an assert can be raised instead.
v2: Review from Anuj Phogat
- Replace redundant "if (!texImage) return;" statements with
assert(texImage)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The comment describing why ARB_direct_state_access texture cube map functions
use _mesa_cube_level_complete is very long. To save room in the files,
readers are now referred to one central comment on texturesubimage in
teximage.c.
v2: Review from Anuj Phogat
- Remove redundant copies of the cube map block comment
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
ARB_direct_state_access texture functions that operate on cube maps no longer
need to verify that cube map texture objects contain six texture images
because _mesa_cube_level_complete now does that for them.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
_mesa_cube_level_complete now verifies that a cube map texture object actually
has six texture images before proceeding.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes the GL_COMPRESSED_RED_RGTC1 part of piglit's rgtc-teximage-01
test as well as the precision part of Wine's 3dc format test (fd.o bug
89156).
The Z component seems to contain a lower precision version of the
result, probably a temporary value from the decompression computation.
The Y and W component contain different data that depends on the input
values as well, but I could not make sense of them (Not that I tried
very hard).
GL_COMPRESSED_SIGNED_RED_RGTC1 still seems to have precision problems in
piglit, and both formats are affected by a compiler bug if they're
sampled by the shader with a swizzle other than .xyzw. Wine uses .xxxx,
which returns random garbage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89156
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>
This means dropping CL_FP_DENORM from the current return value.
v2:
- Add comments about minimum values for OpenCL 1.2.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
From the SNB PRM, volume 4, part 1, page 193:
"The dual source render target messages only have SIMD8 forms due to
maximum message length limitations. SIMD16 pixel shaders must send two of
these messages to cover all of the pixels. Each message contains two colors
(4 channels each) for each pixel in the message payload."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Vertex shaders can have shader inputs where location happens to be
VARYING_SLOT_FACE. Without predicating this on the shader stage,
we suddenly end up with load_front_face intrinsics in vertex shaders,
which is nonsensical.
Fixes spec/arb_vertex_buffer_object/pos-array when using NIR for VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The next commit needs to know the shader stage in glsl_to_nir().
To facilitate that, we pass the gl_shader rather than the raw exec_list
of instructions. This has both the exec_list and the stage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
glsl_to_nir, tgsi_to_nir, and prog_to_nir all want to know whether the
driver supports native integers. Presumably other passes may as well.
Adding this to nir_shader_compiler_options is an easy way to provide
that information, as it's accessible via nir_shader::options.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The code in glsl_to_nir is entirely dead, as we translate from GLSL to
NIR at link time, when there isn't a _mesa_glsl_parse_state to pass,
so every caller passes NULL.
glsl_to_nir seems like the wrong place to try and create the shader
compiler options structure anyway - tgsi_to_nir, prog_to_nir, and other
translators all would have to duplicate that code. The driver should
set this up once with whatever settings it wants, and pass it in.
Eric also added a NirOptions field to ctx->Const.ShaderCompilerOptions[]
and left a comment saying: "The memory for the options is expected to be
kept in a single static copy by the driver." This suggests the plan was
to do exactly that. That pointer was not marked const, however, and the
dead code used a mix of static structures and ralloced ones.
This patch deletes the dead code in glsl_to_nir, instead making it take
the shader compiler options as a mandatory argument. It creates an
(empty) options struct in the i965 driver, and makes NirOptions point
to that. It marks the pointer const so that we can actually do so
without generating "discards const qualifier" compiler warnings.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing actually uses these, and the only caller of glsl_to_nir()
(brw_fs_nir.cpp) always passes NULL for the _mesa_glsl_parse_state
pointer, meaning they'll always be NULL and 0, respectively.
Just delete them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Piglit's spec/glsl-1.20/compiler/structure-and-array-operations/
array-selection.vert test contains the following code:
gl_Position = (pick_from_a_or_b ? a : b)[i];
where "a" and "b" are uniform vec4[2] variables.
ast_to_hir creates a temporary vec4[2] variable, conditional_tmp, and
generates an if-block to copy one or the other:
(declare (temporary) (array vec4 2) conditional_tmp)
(if (var_ref pick_from_a_or_b)
((assign () (var_ref conditional_tmp) (var_ref a)))
((assign () (var_ref conditional_tmp) (var_ref b))))
However, we failed to update max_array_access for "a" and "b", so it
remained 0 - here, the whole array is being accessed. At link time,
update_array_sizes() used this bogus information to change the types
of "a" and "b" to vec4[1]. We then had assignments from a vec4[1] to
a vec4[2], which is highly illegal.
This tripped assertions in nir_split_var_copies with scalar VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: mesa-stable@lists.freedesktop.org
On Gen8+, AND/OR/XOR/NOT don't support the abs() source modifier, and
negate changes meaning to bitwise-not (~, not -). This isn't what NIR
expects, so we should resolve the source modifers via a MOV.
+30 Piglits (fs-op-bit{and,or,xor}-not-abs-*).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
this isn't hooked up to anything at all from what I can see.
Seems like a left over from commit 5d67d4fbebb(st/mesa: remove
st_TexImage(), use core Mesa code instead).
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now that relative-dst works, we should never fall back to the old
compiler. (Which is almost true, other than a couple edge case sched
fails in piglit).
So replace glsl130 flag to force GLSL 130 and integers on a3xx/a4xx with
a glsl120 flag to force GLSL 120 and !integers.
If this commit breaks any game/app/etc use FD_MESA_DEBUG=glsl120 as a
workaround and please let me know.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enable the 'sphinx.ext.graphviz' extension, and add in a section for
driver specific docs, with freedreno compiler docs beneath. The
goal is for more complete compiler docs, and hopefully some docs about
other parts of the driver (such as how tiling works, etc).
Note that there is also a Distribution -> Drivers section. Although
that appears to be simply just a list of drivers. Not sure if that
should move under the 'Drivers' section or left alone. I did add a
one-line section for freedreno in the existing Distribution -> Drivers
section.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To simplify RA, assign arrays that are written to first. Since enough
dependency information is in the graph to preserve order of reads and
writes of array, so all SSA names for the array collapse into one, just
assign the entire thing by array-id.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The meta-deref instruction doesn't really do what we need for relative
destination. Instead, since each instruction can reference at most a
single address value, track the dependency on the address register via
instr->address. This lets us express the dependency regardless of
whether it is used for dst and/or src.
The foreach_ssa_src{_n} iterator macros now also iterates the address
register so, at least in SSA form, the address register behaves as an
additional virtual src to the instruction. Which is pretty much what
we want, as far as scheduling/etc.
TODO:
For now, the foreach_src{_n} iterators are unchanged. We could wrap
the address in an ir3_register and make the foreach_src_{_n} iterators
behave the same way. But that seems unnecessary at this point, since
we mainly care about the address dependency when in SSA form.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I remembered that we are using c99.. which makes some sugary iterator
macros easier. So introduce iterator macros to iterate all src
registers and all SSA src instructions. The _n variants also return
the src #, since there are a handful of places that need this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For cat1 instructions, use reg() as well for relative src, to ensure
proper accounting of register usage. Also, for relative instructions,
use reg->size rather than reg->wrmask to determine the number of
components read/written.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Turns out there are scenarios where we need to insert mov's in "front"
of an input. Triggered by shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 1D_ARRAY
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in r300 files.
When the helper to detect this issue was pushed to master, it broke
the build for the r300 driver. This patch fixes the r300 build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A previous patch to fix header inclusion within extern "C" neglected
to fix the occurences of this pattern in nouveau files.
When the helper to detect this issue was pushed to master, it broke
the build for the nouveau driver. This patch fixes the nouveau build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89477
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There is no HW support for these and the VBO pusher doesn't know about
them. No need to, either, since the st will be lowering them to 2x32.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a help function for each WA and make PIPE_CONTROL flags match the WA
descriptions. Call gen6_wa_pre_pipe_contro() only before PIPE_CONTROLs.
Fix missing gen6_wa_pre_3dstate_vs_toggle() in the rectlist path.
Replace the _MSC_VER >= 1200 with defined (_MSC_VER) and compact if/else
statements. We require MSVC 2008 or later with commit 46110c5d564.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Implicitly required for a while, although commit 9385c592c6 (mapi:
remove u_thread.h) was the one that put the final nail on the
coffin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This has been an implicit rule for building mesa for a long time. Let's
make it official and just bail out at configure time. This way we can
cleaning up some of our glx code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Convert the code to use the C11 threads implementation, and nuke the
Windows non-pthreads code-path. The c11/threads_win32.h abstraction
should be better than the current code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This is just to help repro and fixing these issues with any C++ compiler --
Commiting this will of course wait until all issues are addressed.
$ scons src/glsl/
scons: Reading SConscript files ...
Checking for GCC ... yes
Checking for Clang ... no
Checking for X11 (x11 xext xdamage xfixes glproto >= 1.4.13)... yes
Checking for XCB (x11-xcb xcb-glx >= 1.8.1 xcb-dri2 >= 1.8)... yes
Checking for XF86VIDMODE (xxf86vm)... yes
Checking for DRM (libdrm >= 2.4.38)... yes
Checking for UDEV (libudev >= 151)... yes
warning: LLVM disabled: not building llvmpipe
scons: done reading SConscript files.
scons: Building targets ...
scons: building associated VariantDir targets: build/linux-x86_64-debug/glsl
Compiling src/glsl/ast_array_index.cpp ...
Compiling src/glsl/ast_expr.cpp ...
Compiling src/glsl/ast_function.cpp ...
Compiling src/glsl/ast_to_hir.cpp ...
Compiling src/glsl/ast_type.cpp ...
Compiling src/glsl/builtin_functions.cpp ...
In file included from include/c99_compat.h:28:0,
from src/mapi/u_compiler.h:4,
from src/mapi/u_thread.h:47,
from src/mapi/glapi/glapi.h:47,
from src/mesa/main/mtypes.h:42,
from src/mesa/main/errors.h:47,
from src/mesa/main/imports.h:41,
from src/mesa/main/core.h:44,
from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
In file included from include/c99_compat.h:28:0,
from include/c11/threads.h:38,
from src/mapi/u_thread.h:49,
from src/mapi/glapi/glapi.h:47,
from src/mesa/main/mtypes.h:42,
from src/mesa/main/errors.h:47,
from src/mesa/main/imports.h:41,
from src/mesa/main/core.h:44,
from src/glsl/builtin_functions.cpp:58:
include/no_extern_c.h:48:1: error: template with C linkage
template<class T> class _IncludeInsideExternCNotPortable;
^
Compiling src/glsl/builtin_types.cpp ...
Compiling src/glsl/builtin_variables.cpp ...
scons: *** [build/linux-x86_64-debug/glsl/builtin_functions.os] Error 1
scons: building terminated because of errors.
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
If scratch space is needed for a shader stage we try to reuse the last scratch
buffer bound to that stage. If we can't, we free the old scratch buffer and
allocate a new one. This means we always keep the last scratch buffer for a
particular shader stage around for the entire life span of the context.
These buffers are being reported by Valgrind as definitely lost after
destroying the OpenGL context. For example, for the geometry shader stage:
==18350== 248 bytes in 1 blocks are definitely lost in loss record 85 of 150
==18350== at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18350== by 0xA1B35D6: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem.c:724)
==18350== by 0xA1B383F: drm_intel_gem_bo_alloc (intel_bufmgr_gem.c:794)
==18350== by 0xA1AEFA3: drm_intel_bo_alloc (intel_bufmgr.c:52)
==18350== by 0x9D08E31: brw_get_scratch_bo (brw_program.c:226)
==18350== by 0x9D2A0F2: do_gs_prog (brw_vec4_gs.c:280)
==18350== by 0x9D2A635: brw_gs_precompile (brw_vec4_gs.c:401)
==18350== by 0x9D14F68: brw_shader_precompile(gl_context*, gl_shader_program*) (brw_shader.cpp:76)
==18350== by 0x9D157B8: brw_link_shader (brw_shader.cpp:269)
==18350== by 0x9B0941E: _mesa_glsl_link_shader (ir_to_mesa.cpp:3038)
==18350== by 0x99AE4ED: link_program (shaderapi.c:917)
==18350== by 0x99AF365: _mesa_LinkProgram (shaderapi.c:1385)
So make sure that by the time we destroy the context we check if we have live
scratch buffers for the various stages and release them if that is the case.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This output variables gives more flexibility for future changes
in autoconf to detect if it is needed to auto-generate files and
check for the auto-generation dependencies.
It is still returning error when Python is not installed.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Was resulting in gl_PointSize write being optimized out, causing
particle system type shaders to hang if hw binning enabled.
Fixes neverball, OGLES2ParticleSystem, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use common intrastage array validation for interface blocks.
This change also allows us to support interface blocks
that are arrays of arrays.
V2: Reinsert unsized array asserts in interstage_match()
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
A while back I switched intel_blit_framebuffer to prefer Meta over the
BLT. This meant that Gen8 platforms would start using the 3D engine
for blits, just like we do on Gen6-7.5.
However, I hadn't considered Gen4-5 when making that change. The BLT
engine appears to be substantially faster on 965GM than using Meta to
drive the 3D engine. This isn't too surprising: original Gen4 doesn't
support tile offsets (that came on G45), and the level/layer fields
don't work for cubemap rendering, so for inconvenient miplevel
alignments, we end up blitting or copying data to/from temporaries
in order to render to it. We may as well just use the blitter.
I chose to use the BLT on Gen4-5 because they use the same ring for
both 3D and BLT; Gen6+ splits it out.
Fixes regressions on 965GM due to botched tile offset code (we should
fix those properly as well, but they're longstanding bugs - for now,
put things back to the status quo).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
The former is used by the kernel driver to set up fence registers and to pass
tiling info across processes. It lacks INTEL_TILING_W, which made our code
less expressive.
System headers may contain C++ declarations, which cannot be given C
linkage. For this reason, include statements should never occur
inside extern "C".
This patch moves the C linkage statements to enclose only the
declarations within a single header.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The SSSE3 swizzling code was written for fast uploads to the GPU and
assumed the destination was always 16-byte aligned. When we began using
this code for fast downloads as well we didn't do anything to account
for the fact that the destination pointer given by glReadPixels() or
glGetTexImage() is not guaranteed to be suitably aligned.
With SSSE3 enabled (at compile-time), some applications would crash when
an SSE aligned-store instruction tried to store to an unaligned
destination (or an assertion that the destination is aligned would
trigger).
To remedy this, tell intel_get_memcpy() whether we're uploading or
downloading so that it can select whether to assume the destination or
source is aligned, respectively.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89416
Tested-by: Uriy Zhuravlev <stalkerg@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Several patches added include statements where required by the m64
build. Some files are only compiled for m32, and require similar
changes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2:
- Report correct values for CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE
and CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE.
- Only define cl_khr_fp64 if the extension is supported.
- Remove trailing space from extension string.
- Rename device query function from cl_khr_fp64() to
has_doubles().
v3:
- Return 0 for device::doubled_fp_confg() when doubles aren't
supported.
v4:
- Remove device query for double fp_config.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The header is included in ../xmlpool.h. With the latter of which used
directly in a number of places in mesa.
Note that we can also add it (alongside t_option.h) to noinst_HEADERS,
but neither solution fixes the issue that brough us here - namely:
Do not regenerate the headers, if it already exists.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I.e. add {shared-,}glapi/glapi_mapi_tmp.h to the SOURCES list. Otherwise
there will be no knowledge that the file is required by others for the
build. Thus autotools won't pick it up for the distribution tarball.
v2: Don't forget about the static glapi. Spotted by Matt.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some of the files generated were not in the SOURCES variable, thus
although generated prior to compilation the dependency tracking was
incomplete. The latter of which resulted in the files missing from the
distribution tarball.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The file is auto-generated, and #included by formats.c. Let's rename it
to reflect the latter. This will also help up fix the dependency
tracking by adding it to the _SOURCES variable, without the side effect
of it being compiled (twice).
v2: Update .gitignore to reflect the rename.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Drop the no longer present get_es{1,2}.c from the list.
v2: Keep the format_info.c rename hunk out of this patch.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All the users directly include the header, plus we have a in-tree
replacements for non C99 compilers which we already use.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently these files are including it indirectly via eglcompiler.h
The latter of which will be removed with follow up commits.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Should no longer be used. As many places indirectly include
eglcompiler.h keep this change separate, so that it can be easily
reverted, if needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With the split of the gallium egl module we had previously it required
access to some of the internal functions. As the only build (automake)
that did this no longer builds it we can now appropriately hide those
functions.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The latter is a C99 standard, and our current wrapper c99_compat.h
should handle non-compliant compilers.
Drop the c99_compat.h inclusion from eglcompiler.h altogether, as it's
no longer required.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drop the custom keyword in favour of the C99 one. All the places using
it now directly include c99_compat.h which should handle things on
platforms which lack it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
THREADS was defined if HAVE_PTHREADS or _WIN32 was defined. That's
always the case. The build would die in c11/threads.h otherwise.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Remove u_thread_self() since u_thread.h is going away soon.
Create a simple thread ID abstraction which wraps WIN32 or c11 threads.
This also gets rid of the questionable casting of thrd_t to an unsigned
long.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
So it matches the preprocessor check around the u_current_init_tsd() code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of relying on glapi.h or some other header to provide it.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Let's directly include c11/threads.h instead of relying on glapi.h
to provide it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The yoffset needs to be interpreted as a slice offset for 1D array
textures. This patch implements that by moving the yoffset into
zoffset similar to how it moves the height into depth.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Now that a layered source PBO is interpreted as a single tall 2D image
it's quite easy to accept the image height packing option by just
creating an image that is tall enough to include the image padding.
I'm not sure whether the image height property should affect 1D_ARRAY
textures. My intuition and interpretation of the GL spec (which is a
bit vague) would be that it shouldn't. However the software fallback
path in Mesa uses the property for packing but not for unpacking. The
binary NVidia driver uses it for both. This patch doesn't use it for
either case so it is different from the software fallback. There is
some discussion about this here:
http://lists.freedesktop.org/archives/mesa-dev/2015-February/077925.html
This is tested by the texsubimage Piglit test with the array and pbo
arguments. Previously this test was skipping this code path because it
always sets the image height.
I've also tested it by modifying the getteximage-targets test. It
wasn't using this code path before because it was using the default
texture object so this code couldn't successfully create a frame
buffer. I also modified it to add some image padding with the image
height in the PBO.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
This reverts commit 546aba143d.
I think the changes to the calls to glBlitFramebuffer from this patch
are no different to what it was doing previously because it used to
set height to 1 before doing the blits. However it was introducing
some problems with the blit for layer 0 because this was no longer
special cased. It didn't fix problems with the yoffset which needs to
be interpreted as a slice offset. I think a better solution would be
to modify the original if statement to cope with the yoffset.
Conflicts:
src/mesa/drivers/common/meta_tex_subimage.c
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Add wrappers for 3DPRIMITIVE to make sure we clear current_pipe_control_dw1
and deferred_pipe_control_dw1 after it. Add missing
gen7_wa_post_ps_and_later().
These WAs
gen7_wa_post_3dstate_push_constant_alloc_ps()
gen7_wa_pre_vs()
gen7_wa_pre_3dstate_sf_depth_bias()
first half of gen7_wa_pre_depth()
gen7_wa_post_ps_and_later()
are Gen7-specific. Update copy-and-pasted gen8_wa_pre_depth() also.
Add intel_winsys_get_reset_stats(), intel_winsys_import_userptr(), and
intel_bo_map_async(). The latter two are stubs, but we are not going to use
them immediately either.
DRM_IOCTL_I915_GEM_WAIT takes an int64_t for the timeout value but
GL_ARB_sync takes an uint64_t. Further, the ioctl used to wait
indefinitely when passed a negative timeout, but it's been broken and
now returns immediately in that case. Thus, if an application passes
UINT64_MAX to wait forever, we overflow to -1LL and return immediately.
Work around this mess by clamping the wait timeout to INT64_MAX.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Define the macro in src/util/macros.h rather than in two different
places. Note that USED isn't actually used anywhere at this time.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
OpenVG API seems to have dwindled away. The code
would still be interesting if we wanted to implement NV_path_rendering
but given the trend of the next gen graphics APIs, it seems
unlikely that this becomes ARB or core.
v2: Remove a few "openvg" references left, per Emil Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Update release notes.
Largely superseeded by src/egl, and
WGL/GLX_EXT_create_context_es_profile extensions.
Note this will break Android.mk with gallium drivers -- somebody
familiar with that build infrastructure will need to update it to use
gallium drivers through egl_dri2.
v2: Remove the _EGL_BUILT_IN_DRIVER_GALLIUM define from
src/egl/main/Android.mk; and update the src/egl/main/Sconscript to
create a SharedLibrary, add versioning, create symlink - copy the bits
from egl-static, per Emil Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Disallow undefined symbols in libEGL.so. Update release notes
This classic driver is so far behind Gallium softpipe/llvmpipe based
one, that's hard to imagine ever being useful.
v2: Drop drivers/windows from src/mesa/Makefile.am:EXTRA_DIST per Emil
Velikov.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Update release notes.
v2:
- Single statement, by using memset return value as suggested by Ian
Romanick.
- No internal declaration, as suggested by Jason Ekstrand.
- Move macros to a header.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since commit 28f3f8d, indices generator take a start parameter. However, some
index values have been left to start at 0.
This fixes the glean/fbo test with the virgl driver, and copytexsubimage
with freedreno.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Fix GCC cpp warnings with glibc >= 2.19.
/usr/include/features.h:148:3: warning: #warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Wcpp]
# warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Correctly set _BaseFormat field when creating a gl_renderbuffer
with EGLImage storage.
Change-Id: I8c9f7302d18b617f54fa68304d8ffee087ed8a77
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Re-enable integer, now that we can handle flat varyings. Still, ofc,
conditional on FD_MESA_DEBUG=glsl130, until we can deprecate _old
compiler..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We may not need this for later a4xx patchlevels, but we do at least need
this for patchlevel 0. Bypass bary.f for fetching varyings when flat
shading is needed (rather than configure via cmdstream). This requires
a special dummy bary.f w/ (ei) flag to signal to scheduler when all
varyings are consumed. And requires shader variants based on rasterizer
flatshade state to handle TGSI_INTERPOLATE_COLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I think there is at least one more sub-encoding, but these two should be
enough to cover the common load/store instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To lower two sided color, tgsi_lowering creates additional BCOLOR inputs
(matching up to the BCOLOR outputs on the vert shader). These inputs
should copy the interpolation state of their matching COLOR input.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
fd3_emit.c: In function ‘fd3_emit_vertex_bufs’:
fd3_emit.c:377:11: warning: unused variable ‘semantic’ [-Wunused-variable]
uint8_t semantic = sem2name(vp->inputs[i].semantic);
and
fd4_emit.c: In function ‘fd4_emit_vertex_bufs’:
fd4_emit.c:304:11: warning: unused variable ‘semantic’ [-Wunused-variable]
uint8_t semantic = sem2name(vp->inputs[i].semantic);
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.
This is achieved by using the appropriate -Werror= options only on the
places they need to be used.
Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies). I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While using various debugging features (optimization debug, instruction dumping,
etc) this function is called in order to get a readable letter for the type of
unit.
On GEN8, two new units were added, the Qword and the Unsigned Qword (Q, and UQ
respectively). The existing assertion tries to determine that the argument
passed in is within the correct boundary, however, it was using UQ as the upper
limit instead of Q.
To my knowledge you can only hit this case with the branch I am currently
working on, so it doesn't fix any known issues.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I'm not really sure of the origins of the existing flag names. Modern docs have
some slightly different names. Having the correct names makes it easier to
determine if existing PIPE_CONTROL flag settings are correct, as well as making
adding new PIPE_CONTROLs easier.
This originally came up while I was trying to implement workarounds and spotted
some things called, "flush" which should have been called "invalidate."
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a fix for a regression introduced in commit a9f8296d ("i965/fs:
Preserve the CFG in a few more places.").
The errata this code works around is described in a comment before the function:
"[DevBW, DevCL] Errata: A destination register from a send can not be
used as a destination register until after it has been sourced by an
instruction with a different destination register.
The framebuffer write's sources must be in message registers, which SEND
instructions cannot have as a destination. There's no way for this
errata to affect anything at the end of the program. Just remove the
code.
Cc: 10.4, 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84613
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were bugs where if the app set a scissor it could affect
the area of the texture that was downloaded. There was also potential that
the framebuffer SRGB state could affect downloads. This ensures that those
will get saved/restored and can't affect the texture download.
Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89292
Reviewed-by: Neil Roberts <neil@linux.intel.com>
We've been using a mix of these two macros for a while now. Let's
just use the later everywhere. It seems to be the convention used
by other open-source projects.
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
In some cases, glheader.h is the right #include.
Also remove some instances of struct _glapi_table declarations.
Acked-by: Matt Turner <mattst88@gmail.com>
It's unmaintained, and most likely broken: I use trace driver every now
and then, and everytime I do I need to fix it up.
It's also unused: identity_screen_create is never called.
Above all, it's dead weight: if identity driver had the infrastructure
for other pass-through drivers (like trace and rbug), then it would make
sense on its own right. But as it is implemmented, it's just another
driver to (forget) to update whenever there is a gallium interface
change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It's a wrapper around emit_buffer_surface_state with format=RAW, pitch=1,
rw=true and the remaining arguments ordered differently. There's no point in
having a separate vtbl pointer for that.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to the bspec for some reason the format of the maximum
number of threads field has changed from U8-2 to U8-1 for the PS.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, we compared our new GS-out VUE map to the existing *VS*-out
VUE map, which is bogus.
This would mostly manifest as redundant dirty flagging where the GS is
in use but the VS and GS output layouts differ; but there is a scary
case where we would fail to flag a GS-out layout change if it happened
to match the VS-out layout.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.5, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88885
The use of the uninitialized_var() macro was to silence an uninitialized
variable warning that I assumed stemmed from gcc being unable to see
inside __get_cpuid() or understand its inline assembly.
In fact, it was because the __get_cpuid() function can fail, and not
initialize its arguments. Instead, check for failure and return early.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 79daa510c7.
I apparently hadn't done a clean build when testing this; it broke the
build for Tom, Ben, and myself. We like the idea; let's try a v2.
The length argument passed to sysctl was the size of the pointer
not the type. The result of this is sysctl calls would fail on
32 bit BSD/Mac OS X.
Additionally the wrong pointer was passed as an argument to store
the result of the sysctl call.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
_mesa_choose_tex_format (texformat.c) tries I8_SNORM, L8_SNORM, and
either L8A8_SNORM or A8L8_SNORM, none of which are supported by our
driver. Failing that, it falls back to RGBX for luminance, and RGBA
intensity and luminance alpha. So, we need to use swizzle overrrides
to obtain the correct values.
Fixes Piglit's EXT_texture_snorm/fbo-blending-formats and
fbo-clear-formats.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
CMP.Z doesn't work on Gen4-5 because the boolean isn't guaranteed to be
0 or 0xFFFFFFFF - only the low bit is defined.
We can call emit_bool_to_cond_code to generate the condition in f0.0;
the last instruction will generate the flag value. We can patch it to
use f0.1, and negate the condition.
Fixes discard tests on Gen4-5.
Haswell shader-db stats:
total instructions in shared programs: 5770279 -> 5769112 (-0.02%)
instructions in affected programs: 64342 -> 63175 (-1.81%)
helped: 1069
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The main objective of this change is to enable Linux developers to use
more of C99 throughout Mesa, with confidence that the portions that need
to be built with MSVC -- and only those portions --, stay portable.
This is achieved by using the appropriate -Werror= options only on the
places they need to be used.
Unfortunately we still need MSVC 2008 on a few portions of the code
(namely llvmpipe and its dependencies). I hope to eventually eliminate
this so that we can use C99 everywhere, but there are technical/logistic
challenges (specifically, newer Windows SDKs no longer bundle MSVC,
instead require a full installation of Visual Studio, and that has
hindered adoption of newer MSVC versions on our build processes.)
Thankfully we have more directy control over our OpenGL driver, which is
why we're now able to migrate to MSVC 2013 for most of the tree.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is to enable the code to build with -Werror=vla in the short term,
and enable the code to build with MSVC2013 soon after.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix build error.
CC compiler/tests/r300_compiler_tests-radeon_compiler_regalloc_tests.o
compiler/tests/radeon_compiler_regalloc_tests.c: In function ‘test_runner_rc_regalloc’:
compiler/tests/radeon_compiler_regalloc_tests.c:57:3: error: implicit declaration of function ‘fprintf’ [-Werror=implicit-function-declaration]
fprintf(stderr, "Failed to load program\n");
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This fixes a dEQP test failure. In the test,
glCompressedTexSubImage2D was called with target = 0 and failed to throw
INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx,
target) being called before the target checking. To remedy this, target
checking was made into its own function and called prior to
_mesa_get_current_tex_object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89311
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes a dEQP test failure. In the test,
glCopyTexSubImage2D was called with target = 0 and failed to throw
INVALID ENUM. This failure was caused by _mesa_get_current_tex_object(ctx,
target) being called before the target checking. To remedy this, target
checking was separated from the main error-checking function and
called prior to _mesa_get_current_tex_object.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89312
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
MSVC 2008 (shipped with Windows SDK 7.0.7600) is the oldest we
need to support. At least on llvmpipe, gallium/auxiliary, and util
modules. For the remaining modules (particular all OpenGL specific
code) can be built with MSVC 2013.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Not needed by anything in that header. Include math.h or c99_math.h
where needed instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Don't include stuff we don't need. Fix a few #includes elsewhere to
keep thing building.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be defined in math.h. If not, we can add them to c99_math.h
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A layered PBO image is now interpreted as a single tall 2D image so
the z argument in _mesa_meta_bind_fbo_image is ignored. Therefore this
was just redundantly rebinding the same image repeatedly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For 32-bit builds, floating point operations use x86 FPU registers,
not SSE registers. If we're actually storing an integer in a float
variable, the value might get modified when written to memory. This
patch changes the VBO code to use the fi_type (float/int union) to
store/copy vertex attributes.
Also, this can improve performance on x86 because moving floats with
integer registers instead of FP registers is faster.
Neil Roberts review:
- include changes on all places that are storing attribute values.
- check with and without -O3 compiler flag.
Brian Paul review:
- use fi_type type instead gl_constant_value type
- fix a bunch of nit-picks.
- fix compiler warnings
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82668
Signed-off-by: Marius Predut <marius.predut@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
before calling _mesa_meta_pbo_TexSubImage(). This will be used in
later patches and will be required in Skylake to get the tile
resource mode of miptree before calling _mesa_meta_pbo_TexSubImage().
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
No functional changes in the patch. Just makes the code look cleaner.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
create_texture_for_pbo() is shared by _mesa_meta_pbo_GetTexSubImage()
and _mesa_meta_pbo_TexSubImage() functions. So, we need to account
for both pack and unpack buffer objects.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
create_texture_for_pbo() is used by both _mesa_meta_pbo_GetTexSubImage()
and _mesa_meta_pbo_TexSubImage() functions with different PBO targets.
Use GL_STREAM_READ with GL_PIXEL_PACK_BUFFER and GL_STREAM_DRAW with
GL_PIXEL_UNPACK_BUFFER.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
There were some bugs, and the code was really difficult to follow. We
would optimize
min(max(x, b), 1.0) into max(sat(x), b)
but not pay attention to the order of min/max and also do
max(min(x, b), 1.0) into max(sat(x), b)
Corrects four shaders from Champions of Regnum that do
min(max(x, 1), 10)
and corrects rendering of Mass Effect under VMware Workstation.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89180
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since now ARRAY_SIZE has been added to util/macros.h. Fixes a bunch of:
freedreno_util.h:79:0: warning: "ARRAY_SIZE" redefined
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
^
In file included from ../../../../src/gallium/include/pipe/p_compiler.h:36:0,
from ../../../../src/gallium/include/pipe/p_context.h:31,
from freedreno_context.h:32,
from freedreno_context.c:29:
../../../../src/util/macros.h:29:0: note: this is the location of the previous definition
# define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
^
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sandybridge doesn't support y-tiling for surface formats with 16 or
more bpp. There was previously an override to explicitly allow this
for Gen7. However, this restriction is also removed in Gen8+ so we
should use y-tiling there too.
This is important to do for Skylake which doesn't support x-tiling for
3D surfaces.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
If the renderer supports the core profile the query returned incorrectly
0x8 as value, because it was using (1U << __DRI_API_OPENGL_CORE) for the
returned value.
The same happened with the compatibility profile. It returned 0x1
(1U << __DRI_API_OPENGL) instead of 0x2.
Internal DRI defines:
dri_interface.h: #define __DRI_API_OPENGL 0
dri_interface.h: #define __DRI_API_OPENGL_CORE 3
Those two bits are supposed for internal usage only and should be
translated to GLX_CONTEXT_CORE_PROFILE_BIT_ARB (0x1) for a preferred
core context profile and GLX_CONTEXT_COMPATIBILITY_PROFILE_BIT_ARB (0x2)
for a preferred compatibility context profile.
This patch implements the above translation in the glx module.
v2: Fix the incorrect behavior in the glx module
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Doubles are always packed, but a single double will never cross a slot
boundary -- single slots can still be wasted in some situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Corrects the way that _mesa_meta_pbo_TexSubImage and
_mesa_meta_pbo_GetTexSubImage handle 1D_ARRAY textures. Fixes a failure in
the Piglit arb_direct_state_access/gettextureimage-targets test.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Changes PBO uploads and downloads to use a tall (height * depth) 2D texture
for blitting. This fixes the bug where 2D_ARRAY, 3D, and CUBE_MAP_ARRAY
textures are not properly uploaded and downloaded.
Removes the option to use a 2D ARRAY texture for the PBO during upload and
download. This option didn't work because the miptree couldn't be set up
reliably.
v2: Review from Jason Ekstrand and Neil Roberts:
-Delete the depth parameter from create_texture_for_pbo
-Abandon the option to create a 2D ARRAY texture in create_texture_for_pbo
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
This moves the line setting immutability for the texture to after
_mesa_initialize_texture_object so that the initializer function will not
cancel it out. Moreover, because of the ARB_texture_view extension, immutable
textures must have NumLayers > 0, or depth will equal (0-1)=0xFFFFFFFF during
SURFACE_STATE setup, which triggers assertions.
v2: Review from Kenneth Graunke:
- Include more explanation in the commit message.
- Make texture setup bug fixes into a separate patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
With the previous optimization in place, some shaders wind up with
multiple discard jumps in a row, or jumps directly to the next
instruction. We can remove those.
Without NIR on Haswell:
total instructions in shared programs: 5777258 -> 5775872 (-0.02%)
instructions in affected programs: 20312 -> 18926 (-6.82%)
helped: 716
With NIR on Haswell:
total instructions in shared programs: 5773163 -> 5771785 (-0.02%)
instructions in affected programs: 21040 -> 19662 (-6.55%)
helped: 717
v2: Use the CFG rather than the old instructions list. Presumably
the placeholder halt will be in the last basic block.
v3: Make sure placeholder_halt->prev isn't the head sentinel (caught
twice by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
st_glsl_to_tgsi and ir_to_mesa have handled conditional discards for a
long time; the previous patch added that capability to i965.
i965 (Haswell) shader-db stats:
Without NIR:
total instructions in shared programs: 5792133 -> 5776360 (-0.27%)
instructions in affected programs: 737585 -> 721812 (-2.14%)
helped: 6300
HURT: 68
GAINED: 2
With NIR:
total instructions in shared programs: 5787538 -> 5769569 (-0.31%)
instructions in affected programs: 767843 -> 749874 (-2.34%)
helped: 6522
HURT: 35
GAINED: 6
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The discard condition tells us which channels we want killed. We want
to invert that condition to get the channels that should survive (remain
live) in f0.1. Emit a CMP to negate it.
Nothing generates these today, but that will change shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a conditional discard, which takes a boolean source.
Note that we don't generate ir_discard::condition today, so this
shouldn't break drivers (since none implement this intrinsic yet).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
opt_constant_folding() already detects conditional assignments where the
condition is constant, and either deletes the assignment or the
condition.
Make it handle discards in the same fashion.
Spotted happening in the wild in Tropico 5 shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This pass wasn't prepared to handle conditional discards.
Instead of initializing the "discarded" temporary to "true", set it to
the condition. Then, refer to the variable for the condition, to avoid
duplicating the expression tree.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen8+ support was just broken, since MUL now consumes 32-bits from both
sources. Fixes 986 piglit tests on my BDW.
total instructions in shared programs: 7753873 -> 7753522 (-0.00%)
instructions in affected programs: 28164 -> 27813 (-1.25%)
helped: 77
GAINED: 47
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds a parent_instr field similar to the one for ssa_def. The
difference here is that the parent_instr field on a nir_register can be
NULL if the register does not have a unique definition or if that
definition does not dominate all its uses. We set this field in the
out-of-SSA pass so that backends can get SSA-like information even after
they have gone out of SSA.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
EVENT_TYPE_PIPELINESTAT_STOP disables streamout queries too.
Luckily, pipeline stats are enabled by default, so we don't even have to
emit EVENT_TYPE_PIPELINESTAT_START.
Tested on Hawaii, Bonaire, Redwood, RV730.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- remove the last parameter of si_emit_rasterizer_prim_state
- remove the last unused parameter of si_emit_draw_registers
- use current_rast_prim in si_emit_draw_registers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Mostly dead code or code that didn't do anything.
Computing gs_num_outputs at the end was also useless. It's already set
correctly.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Requires Evergreen/Cayman and radeon kernel module
2.41.0 or newer.
Expected piglit fails due to hardware limitations:
* arb_draw_indirect-draw-arrays-prim-restart
Restarts not applied for DrawArrays commands
* arb_draw_indirect-vertexid
Base vertex offset is not included in vertex id
Marek: bump vgt_state num_dw by 3 (= space needed for one register write)
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
poc counter should be reset with IDR frame,
otherwise there would be a re-order issue with
frames before and after IDR
v2: add commit message
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
With earlier commit (install-lib-links: don't depend on .libs directory)
we moved the location of the file from .libs/ to the current dir.
Although we did not attribute that in the former case autotools was
doing us a favour and removing the file. Explicitly remove the file at
clean-local time, otherwise we'll end up with dangling files.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
According to the spec when no device access mode is specified
clCreateBuffer and clCreateImage* should default to read/write, and
clCreateSubBuffer should default to the parent's device access flags.
clCreateSubBuffer is also required to inherit the host access and
host pointer flags from the parent.
Reviewed-and-tested-by: EdB <edb+mesa@sigluy.net>
Those flags have been introduced in OpenCL 1.2.
[ Francisco Jerez: Rebase. Throw CL_INVALID_VALUE from
clCreateSubBuffer if the subbuffer drops access flags from its
parent. Use single function taking the set of allowed host access
flags to validate memory transfer operands. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
New BO create and mmap ioctls are added. The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit. Shaders are now fixed at
the start of their BOs.
The most common macros are defined there, no use to duplicate these
Clean up the already redefinded macros
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
"(...)Let w be the width rounded to the nearest integer (...). If the
line segment has endpoints given by (x0,y0) and (x1,y1) in window
coordinates, the segment with endpoints (x0,y0-(w-1)/2) and
(x1,y1-(w-1/2)) is rasterized, (...)"
The hardware it not rounding the line width, so we should do it.
Also, we should be careful not to go beyond the hardware limits
for the line width after it gets rounded. Gen6-7 define a maximum line
width slightly below 8.0, so we should advertise a maximum line
width lower than 7.5 to make sure that 7.0 is the maximum integer
line width that we can select. Since the line width granularity in these
platforms is 0.125, we choose 7.375. Other platforms advertise rounded
maximum line widths, so those are fine.
Fixes the following 3 dEQP tests:
dEQP-GLES3.functional.rasterization.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.lines_wide
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The intel driver code, and apparently all other Mesa drivers, call
_mesa_initialize_context early in the CreateContext hook. That
function will end up calling _mesa_init_texture which will do:
ctx->Texture.CubeMapSeamless = _mesa_is_gles3(ctx);
But this won't work at this point, since _mesa_is_gles3 requires
ctx->Version to be set and that will not happen until late
in the CreateContext hook, when _mesa_compute_version is called.
We can't just move the call to _mesa_compute_version before
_mesa_initialize_context since it needs that available extensions
have been computed, which again requires other things to be
initialized, etc. Instead, we enable seamless cube maps since
GLES2, which should work for most implementations, and expect
drivers that don't support this to disable it manually as part
of their context initialization setup.
Fixes the following 192 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.formats.*
dEQP-GLES3.functional.texture.filtering.cube.sizes.*
dEQP-GLES3.functional.texture.filtering.cube.combinations.*
dEQP-GLES3.functional.texture.mipmap.cube.*
dEQP-GLES3.functional.texture.vertex.cube.filtering.*
dEQP-GLES3.functional.texture.vertex.cube.wrap.*
dEQP-GLES3.functional.shaders.texture_functions.texturelod.samplercube_fixed_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4
spec states:
"BeginQuery generates an INVALID_OPERATION error if any of the
following conditions hold: [...] id is the name of an
existing query object whose type does not match target; [...]
Similar wording exists in the OpenGL 4.5 spec, section 4.2. QUERY
OBJECTS AND ASYNCHRONOUS QUERIES, page 43.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.fragment.begin_query
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4 states:
"The command void GenQueries( sizei n, uint *ids ); returns n previously unused
query object names in ids. These names are marked as used, for the purposes of
GenQueries only, but no object is associated with them until the first time they
are used by BeginQuery."
This means that any attempt to use or query a Query object id before it has ever
been bound by calling glBeginQuery, should be assume to be an invalid object.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_query_objectuiv
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The zoffset and depth values were not being considered when calling
error_check_subtexture_dimensions().
Fixes 2 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_offset
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_invalid_offset
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedestkop.org>
Across the board of the various generations, the intial few atoms in
all of the atom lists are basically the same, (performing uploads for
the various programs). The only difference is that prior to gen6
there's an ff_gs upload in place of the later gs upload.
In this commit, instead of using the atom lists for this program state
upload, we add a new function brw_upload_programs that calls into the
per-stage upload functions which in turn check dirty bits and return
immediately if nothing needs to be done.
This commit is intended to have no functional change. The motivation
is that future code, (such as the shader cache), wants to have a
single function within which to perform various operations before and
after program upload, (with some local variables holding state across
the upload).
It may be worth looking at whether some of the other functionality
currently handled via atoms might also be more cleanly handled in a
similar fashion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In current code, color format is always hardcoded to
__DRI_IMAGE_FORMAT_ARGB8888 when buffer or DRI image is
allocated in function calls, get_back_bo and dri2_get_buffers,
regardless of current target's color format. This problem
may leads to incorrect render pitch calculation, which
eventually ends up with wrong offset of pixels in
the frame buffer when the image is in different color format
from dri surf's, especially with different bpp. (e.g. RGB565-16bpp)
Attached code patch simply adds RGB565 and XRGB8888 cases to two
functions noted above to resolve the issue.
v2: added a case of XRGB8888, format and bpp selection is done
via switch-case (not "if-else" anymore)
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
The alternative would be to include math.h in c99_compat.h but that
seems heavy-handed.
This patch also replaces INLINE with inline in the c99 math function
wrappers.
Fixes MSVC build.
Acked-by: Matt Turner <mattst88@gmail.com>
We were already do this for ALU operations but we haven't for non-ALU
operations. This changes that.
total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%)
NIR instructions in affected programs: 1768850 -> 1751305 (-0.99%)
helped: 14244
HURT: 124
total FS instructions in shared programs: 4083960 -> 4084036 (0.00%)
FS instructions in affected programs: 7302 -> 7378 (1.04%)
helped: 12
HURT: 51
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around. On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the number of equally-colored nearby nodes, what increases
the likelihood of failure in presence of optimistically colorable
nodes.
This patch disables the round-robin strategy for optimistically
colorable nodes. These typically arise in situations of high register
pressure or for registers with large live intervals, in both cases the
task of the instruction scheduler shouldn't be constrained excessively
by the dense packing of those nodes, and a spill (or on Intel hardware
a fall-back to SIMD8 mode) is invariably worse than a slightly less
optimal scheduling.
Shader-db results on the i965 driver:
total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
instructions in affected programs: 1121 -> 1071 (-4.46%)
helped: 1
HURT: 0
GAINED: 49
LOST: 5
v2: Re-enable round-robin already for the lowest one of the nodes
pushed optimistically onto the sack (Connor).
v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor).
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Fixes metadata guess when instructions in the program specify a
destination register with non-zero reg_offset and when the payload of
a LOAD_PAYLOAD spans several registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It's perfectly fine to read the second half of a register written with
force_writemask_all from a first half MOV instruction or vice versa, and
lower_load_payload shouldn't mark the whole MOV as belonging to the second
half in that case. Replicate the same metadata to both halves of the
destination when writemasking is disabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There's some commentary about how it's defined by other "modules", and
maybe that was true in 2000 when the code was added.
Reviewed-by: Eric Anholt <eric@anholt.net>
... and util_le{16,32}_to_cpu. I think I've used the right ones for
describing the actual operation performed (even though they're both just
"byte-swap this if I'm on big-endian").
The Linux Kernel has typedefs __le32/__be32 and friends that static
analysis tools can use to check that byte-orderings are correct. It
might be interesting to apply that here as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This corrects a trivial error introduced in commit
19252fee46. That patch was merged recently
and omits one condition (that 'samples' is greater than zero) in one of
the error checks. That error will definitely cause regressions.
Also corrects the reference to the specification above the error check,
which was wrongly quoting OpenGL instead of OpenGL-ES.
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful. In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.
No shader-db effects.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We have some useful optimizations to drop things like 'ine a, 0' on a
boolean argument, but if 'a' came from logical operations on bools, it
couldn't tell. These kinds of constructs appear as a result of TGSI->NIR
quite frequently (at least with if flattening), so being a little more
aggressive in detecting booleans can pay off.
v2: Add ixor as a booleanness-preserving op (Suggestion by Connor).
vc4 results:
total instructions in shared programs: 40207 -> 39881 (-0.81%)
instructions in affected programs: 6677 -> 6351 (-4.88%)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
If there is no pci-id, which is valid for vc4 and freedreno, just emit
an info msg. Keep malformed but existing pci-id's as a warning.
Mostly just to clean up a warning that confuses users for the non-pci
devices.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
I never actually implemented the stubbed out fence stuff back in the
early days. Fix that.
We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The brw_imm_ud will yield a HW_REG which then will introduce a barrier
for certain optimization opportunities.
No piglit regressions seen with gen8 (simd8vs).
Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For fragment programs, we pull this mask from the payload header. The same
mask doesn't exist for compute shaders, so we set all bits to enabled.
Previously we were setting 0xff to support SIMD8 VS, but with CS we
support SIMD16, and therefore we change this to 0xffff.
Related commits for SIMD8 VS:
commit d9cd982d55
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Sun Feb 15 20:06:59 2015 -0800
i965/simd8vs: Fix SIMD8 atomics
commit 4a95be9772
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Tue Feb 17 09:57:35 2015 -0800
i965/simd8vs: Fix SIMD8 atomics (read-only)
Note: this mask is ANDed with the execution mask, so some channels may not end
up issuing the atomic operation.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since dropping some NV_fragment_program opcodes (commits
868f95f1da, a3688d686f)
we can no longer parse all opcodes necessary for this extension, leading
to bugs (https://bugs.freedesktop.org/show_bug.cgi?id=86980).
Hence don't announce support for it in swrast (no other driver enabled it).
(Note that remnants of some NV_fp/vp extensions remain, they could be
dropped but are required as hacks for getting viewperf11 catia to run.)
mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not
defined. Confusing and bizarre that you don't get NDEBUG if you don't
include mtypes.h.
... which is just what happened in commit bef38f62e.
Let's let configure define this for us if not using --enable-debug.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which
doesn't exist, breaking the build:
assert(state->states[index].index < state->states[index].stack_size);
Switch it to ifndef NDEBUG, so the field will exist if the assertion
actually generates code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
One less new directory necessary for gallium code that wants to interact
with NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Note that we can't use u_math.h's align() because it's a function instead
of a macro, while BITSET_DECLARE needs a constant expression for nouveau's
usage in global declarations.
v2: Stick some parens around the bits macro argument usage (review by Jose).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This avoids duplication of some macros and other definitions across the
tree.
Note that COPY_4FV switches from a memcpy-based implementation to an
assignment of 4 floats.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It introduces references to gallium util/ symbols which means we don't get
to include it from outside-of-gallium code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Missed a few drivers in the earlier changes, this should fix up all the
ones that print unknown caps or don't have a default statement.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In gen6 we need to compute the primitive count in the generated GS program.
The current implementation only counts full primitives, that is, if the
output primitive type is a triangle strip, it won't count individual
triangles in the strip, only complete strips.
If we want to count basic primitives instead we have two options: rework
the assembly code we generate for strip primitives or simply use
CL_INVOCATION_COUNT to resolve the query and let the hardware do that work
for us. This patch implements the latter approach.
Fixes the following piglit test:
bin/arb_pipeline_statistics_query-geom -auto
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89210
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Section 4.2 (Whole Framebuffer Operations) of the OpenGL 3.0 specification
says:
"Each buffer listed in bufs must be BACK, NONE, or one of the values from
table 4.3 (NONE, COLOR_ATTACHMENTi)".
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.draw_buffers
Reviewed-by: Matt Turner <mattst88@gmail.com>
GLSL 1.50 and GLSL 4.40 specs, they both say the same in
"Interface Blocks" section:
"If optional qualifiers are used, they can include interpolation qualifiers,
auxiliary storage qualifiers, and storage qualifiers and they must declare
an input, output, or uniform member consistent with the interface qualifier
of the block"
From GLSL ES 3.0, chapter 4.3.7 "Interface Blocks", page 38:
"GLSL ES 3.0 does not support interface blocks for shader inputs or outputs."
and from GLSL ES 3.0, chapter 4.6.1 "The invariant qualifier", page 52.
"Only variables output from a shader can be candidates for invariance."
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
v2:
- Enable this check for GLSL.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.
total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs: 1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs: 6027 -> 5848 (-2.97%)
I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.
Each emit_cond_mov() emits a CMP of its first to arguments using the
specified conditional mod, followed by a predicated MOV of the fifth
argument into the fourth. In all four cases here, it was just
implementing MIN/MAX which we can do in a single SEL instruction.
Also reorder the instructions for a slightly better schedule.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things, SEL
with these conditional mods are commutative.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were special casing OPCODE_END but no other instructions that have no
destination, like OPCODE_KIL, leading us to emitting MOVs with null
destinations.
total instructions in shared programs: 5705243 -> 5701539 (-0.06%)
instructions in affected programs: 124104 -> 120400 (-2.98%)
helped: 904
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The saturate propagation pass recognizes that the second instruction
below does not interfere with an attempt to propagate the saturate
modifier from instruction 3 to 1.
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 dst0
3: mov.sat(8) dst2 dst0
Unfortunately, we did not consider the case of instruction 2 having a
source modifier on dst0. Take for instance:
1: add(8) dst0 src0 src1
2: mov.sat(8) dst1 -dst0
3: mov.sat(8) dst2 dst0
Consider such an instruction to interfere. Increase instruction counts
in Anomaly 2, which could be a bug fix depending on the values the first
instruction produces.
instructions in affected programs: 53228 -> 53934 (1.33%)
HURT: 360
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This lets us be slightly more efficient by not walking the CFG extra times.
Also, it may make it easier to ensure that GVN happens on only unpinned
instructions.
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use nir_dominance_lca for computing least common anscestors
- Use the block index for comparing dominance tree depths
- Pin things that do partial derivatives
Reviewed-by: Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Right now, the nir_instr_prev function function blindly looks up the
previous element in the exec list and casts it to an instruction even if
it's the tail sentinel. The next commit will change this to return null if
it's the first instruction. Making this change first avoids getting a
segfault between commits. The only reason we never noticed is that, thanks
to the way things are laid out in nir_block, the casted instruction's type
was never parallal_copy.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, if you remved a CF node that still had instructions in it, none
of the use/def information from those instructions would get cleaned up.
Also, we weren't removing if statements from the if_uses of the
corresponding register or SSA def. This commit fixes both of these
problems
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is mostly thanks to Connor. The idea is to do a depth-first search
that computes pre and post indices for all the blocks. We can then figure
out if one block dominates another in constant time by two simple
comparison operations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Being able to find the least common anscestor in the dominance tree is a
useful thing that we may want to do in other passes. In particular, we
need it for GCM.
v2: Handle NULL inputs by returning the other block
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This adds support to the state tracker for
ARB_gpu_shader_fp64.
The details are explained in comments
within the code.
v2 : add double to int/unsigned conversion
v3: handle fp64 consts better
v4: use DRSQ
v4.1: add d2b
v4.2: drop DDIV
v5: split out some prep patches.
v5.1: add some comments.
v5.2: more comments
v6: simplify down the double instruction
generation loop.
v7: Merge Ilia's two cleanup patches.
v7.1: minor fixups for Ilia patch + cleanups
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves stuff around a little to make the next patch
cleaner.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just prep work for fp64 support where we need
an array of 2 dst values.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just fills in some blanks to avoid warnings in the i965 driver.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is basically Ian's review feedback for my patch that added
_mesa_shader_stage_to_abbrev() - it just makes both consistent again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the vec4 backend labeled shaders as "vec4" - now it uses the
specific names "VS" and "GS".
The FS backend now correctly prints "VS" for vertex shaders (rather than
"fs"). It also prints "FS" instead of "fs" for fragment shaders;
preserving that behavior didn't seem essential.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We introduce three new fields in backend_visitor:
- debug_enabled: whether or not INTEL_DEBUG & DEBUG_<stage flag>
- stage_name: "vertex", "fragment", etc. for use in messages
- stage_abbrev: "VS", "FS", etc. for use in messages
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When compiling, we have a gl_shader_stage (MESA_SHADER_*) enum, and want
to know whether debugging is enabled for that stage. This allows us to
easily translate it into the corresponding debug flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Comparing the location field is equivalent and more efficient.
We'll also need this when we start using NIR for ARB programs, as our
NIR converter will set the location field correctly, but probably won't
use the GLSL names for these concepts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
It looks like no hw does div anyways, so we should just
lower at the GLSL level.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These act like flt32 except they take up two slots, and you
can only add 2 x flt64 constants in one slot.
The main reason they are different is we don't want to match half a flt64
constants against a flt32 constant in the matching code, we need to make
sure we treat both parts of the flt64 as an single structure.
Cleaned up printing/parsing by Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch adds support for a set of double opcodes
to TGSI. It is an update of work done originally
by Michal Krol on the gallium-double-opcodes branch.
The opcodes have a hint where they came from in the
header file.
v2: add unsigned/int <-> double
v2.1: update docs.
v3: add DRSQ (Glenn), fix review comments (Glenn).
v4: drop DDIV
v4.1: cleanups, fix some docs bugs, (Ilia)
rework store_dest and fetch_source fns. (Ilia)
4.2: fixup float comparisons (Ilia)
This is based on code by Michael Krol <michal@vmware.com>
Roland and Glenn also reviewed earlier versions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
To silence compiler warnings about unhandled switch cases.
v2: move GSL_TYPE_DOUBLE case to the "Invalid type in type_size" section,
per Ilia.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
To silence compiler warning about unhandled switch case.
v2: move GLSL_TYPE_DOUBLE to the "not reached" section, per Ilia.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use pipe_sampler_view_reference() instead of ordinary assignment.
Also add a new sanity check assertion.
Fixes piglit gl-1.0-drawpixels-color-index test crash. But note
that the test still fails.
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This snippet can be included in Makefiles that may, depending on the
project configuration, not actually build any installable libraries.
In that case we don't have anything to depend on and this part of
the makefile may be executed before the .libs directory is created,
so do not depend on it being there.
Cc: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This fixes a regression in the running time of Piglit introduced by
commit 78e9043475, which increased the
number of register allocation classes set up by the VEC4 back-end
from 2 to 16. The algorithm used by ra_set_finalize() to calculate
them is unnecessarily expensive, do it manually like the FS back-end
does.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some instruction bits don't have a mapping defined to any compacted
instruction field. If they're ever set and we end up compacting the
instruction they will be forced to zero. Avoid using compaction in such
cases.
v2: Align multiple lines of an expression to the same column. Change
conditional compaction of 3-source instructions to an
assertion. (Matt)
v3: The 3-source instruction bit 105 is part of SourceIndex on CHV.
Add assertion that reserved bit 7 is not set. (Matt)
Document overlap with UIP and 64-bit immediate fields.
v4: Make some more unmapped bit checks assertions. (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Due to the way it's implemented in hardware, the F16TO32/F32TO16
instructions require the source/destination register to be of some
16-bit type in Align1 mode, while they require it to be some 32-bit
type in Align16 mode (and as an undocumented feature the high 16 bits
of the destination register are zeroed out in the case of the F32TO16
instruction on Gen7). Make their behaviour consistent so you can
specify a 32 bit register type as source or destination and get
predictable results in the most significant bits no matter what access
mode is being used.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We cannot zero out the destination register if it overlaps with the
source. Use an Align1 instruction instead to zero out the high 16
bits after the conversion to half float.
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the source type differs from the original type of the constant we
need to bit-cast it before propagating, otherwise the original type
information will be lost. If the constant was a vector float there
isn't much we can do, because the result of bit-casting the component
values of a vector float cannot itself be represented as an immediate.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Create a new search function to look for matching built-in functions by name
and use it for built-in function redefinition or overload in GLSL ES 3.00.
GLSL ES 3.0 spec, chapter 6.1 "Function Definitions", page 71
"A shader cannot redefine or overload built-in functions."
While in GLSL ES 1.0 specification, chapter 8 "Built-in Functions"
"User code can overload the built-in functions but cannot redefine them."
So this check is specific to GLSL ES 3.00.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.overload_builtin_function_fragment
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_vertex
dEQP-GLES3.functional.shaders.functions.invalid.redefine_builtin_function_fragment
No piglit regressions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Per GLES3 specification, section 4.4 Framebuffer objects page 198, "If
internalformat is a signed or unsigned integer format and samples is greater
than zero, then the error INVALID_OPERATION is generated.".
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.renderbuffer_storage_multisample
Reviewed-by: Matt Turner <mattst88@gmail.com>
'3.8.2 Sampler Objects' section of the GL-ES 3.0 specification states:
"An INVALID_OPERATION error is generated if sampler is not the name
of a sampler object previously returned from a call to GenSamplers."
In desktop GL, an GL_INVALID_VALUE is returned instead.
Fixes 6 dEQP failing tests:
* dEQP-GLES3.functional.negative_api.shader.get_sampler_parameteriv
* dEQP-GLES3.functional.negative_api.shader.get_sampler_parameterfv
* dEQP-GLES3.functional.negative_api.shader.sampler_parameteri
* dEQP-GLES3.functional.negative_api.shader.sampler_parameteriv
* dEQP-GLES3.functional.negative_api.shader.sampler_parameterf
* dEQP-GLES3.functional.negative_api.shader.sampler_parameterfv
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Rebase on the nir_opcodes.h python code generation support.
v3: Use SSA values, and set an appropriate writemask on dot products.
v4: Make the arguments be SSA references as well. This lets you stack up
expressions in the arguments of other expressions, at the cost of
having to insert a fmov/imov if you want to swizzle. Also, add
the generated file to NIR_GENERATED_FILES.
v5: Use more pythonish style for iterating the list.
v6: Infer the size of the dest from the size of the srcs, and auto-swizzle
a single small src out to the appropriate size.
v7: Add little helpers for initializing the struct, add a typedef for the
struct like other nir types have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v6)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v7)
These lowering passes are optional for the backend to request, currently
the TGSI softpipe backend most likely the r600g backend would want to use
these passes as is. They aim to hit the gallium opcodes from the standard
rounding/truncation functions.
v2: also lower floor in mod_to_floor
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This implements the bulk of the builtin functions for fp64 support.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to restrict some lowering passes to floats only,
and enable other for doubles.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Patch fixes Piglit test:
arb_gpu_shader_fp64/preprocessor/fs-output-double.frag
and adds additional validation for shader outputs.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a enter/leave record callback so that the offset may be aligned to
the proper value. Otherwise only leaf fields are called, and the first
field needs to be aligned to the outer struct's base alignment while the
last field needs to be aligned to the inner struct's base alignment.
This removes most usage of the last field/record type values passed into
visit_field.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
These functions are about to be used more aggressively for determining
uniform layout. Samplers may be inside of structs, and it's easier to
reuse the existing base alignment logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
v2: add define bit (Tapani Pälli)
Patch makes following Piglit tests pass:
arb_gpu_shader_fp64/preprocessor/define.vert
arb_gpu_shader_fp64/preprocessor/define.frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds support for the new uniform interfaces
from ARB_gpu_shader_fp64.
v2:
support ARB_separate_shader_objects ProgramUniform*d* (Ian)
don't allow boolean uniforms to be updated (issue 15) (Ian)
v3: fix size_mul
v4: Teach uniform update to take into account double precision (Topi)
v5: add transpose for double case (Ilia)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just add the xml file covering this extension,
and dummy interface files in mesa, and fix up
sanity tests.
v2:
Enable ProgramUniform*d* from ARB_separate_shader_objects (Ian)
use 40 instead of 43 for dispatch_sanity.cpp (Chris)
uncomment PU sanity tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the driver actually supports ETC2, don't decode it in software.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
No actual decoding is added, similar faking mechanism to bptc.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This could be done in a separate pass like we do in GLSL IR, but it seems
to me like having the definitions of the transformations in the two
directions next to each other makes a lot of sense.
v2: Reorder the comment about the transformation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa has a shader compiler struct flagging whether GLSL IR's opt_algebraic
and other passes should try and generate certain types of opcodes or
patterns. Extend that to NIR by defining our own struct, which is
automatically generated from the Mesa struct in glsl_to_nir and provided
directly by the driver in TGSI-to-NIR.
v2: Split out the previous two prep patches.
v3: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will be used so that we can customize the transforms for the target
GPU, so we don't un-lower expressions that had already been lowered (or
introduce new lowering transformations that not all GPUs want)
v2: Drop the complication of having the condition->index dictionary, since
we don't actually expect there to be many different conditions (change
by Kenneth).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to give the optimization passes a chance to customize
behavior for the particular target device.
v2: Rebase to master (no TGSI->NIR present)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
An update for d9cd982d55.
A similar change was needed for CS to allow the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass.
The previous change (d9cd982d) should fix cases that write atomics,
such as atomicCounterIncrement, and this change will fix cases than
only read atomics, such as atomicCounter.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With commit c39dbfdd0f7(auxiliary/vl: bring back the VL code for the dri
targets) we did not fully consider users of dri-swrast alone. Thus we
ended up trying to compile the dri2 specific code on platform which lack
it - Cygwin for example.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reported-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Currently we use DISTCHECK_CONFIGURE_FLAGS, which is reserved for
the user. As with other variables, one should use the AM_ variable
within the makefile.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The XExtensionInfo is allocated dynamically (if the pointer is NULL)
in the XEXT_GENERATE_FIND_DISPLAY macro. On the other hand the
macro XEXT_GENERATE_CLOSE_DISPLAY does not check/free the memory.
Follow the example set by dri1 and appledri, and use a static variable.
Spotted while hunting "still reachable" leaks in Waffle.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
NOTE: The implementation was initially one patch, this. All the history is kept
here, even though all the core mesa changes were moved to the parent of this
patch.
This patch implements ARB_pipeline_statistics_query. This addition to GL does
not add a new API. Instead, it adds new tokens to the existing query APIs. The
work to hook up the new tokens is trivial due to it's similarity to the previous
work done for the query APIs. I've implemented all the new tokens to some
degree, but have stubbed out the untested ones at the entry point for Begin().
Doing this should allow the remainder of the code to be left in.
The new tokens give GL clients a way to obtain stats about the GL pipeline.
Generally, you get the number of things going in, invocations, and number of
things coming out, primitives, of the various stages. There are two immediate
uses for this, performance information, and debugging various types of
misrendering. I doubt one can use these for debugging very complex applications,
but for piglit tests, it should be quite useful.
Tessellation shaders, and compute shaders are not addressed in this patch
because there is no upstream implementation. I've implemented how I believe
tessellation shader stats will work for Intel hardware (though there is a bit of
ambiguity). Compute shaders are a bit more interesting though, and I don't yet
know what we'll do there.
For the lazy, here is a link to the relevant part of the spec:
https://www.opengl.org/registry/specs/ARB/pipeline_statistics_query.txt
Running the piglit tests
http://lists.freedesktop.org/archives/piglit/2014-November/013321.html
(http://cgit.freedesktop.org/~bwidawsk/piglit/log/?h=pipe_stats)
yield the following results:
> piglit-run.py -t stats tests/all.py output/pipeline_stats
> [5/5] pass: 5 Running Test(s): 5
v2:
- Don't allow pipeline_stats to be per stream (Ilia). This may (not sure) be
needed for AMD_transform_feedback4, which we do not support.
> If AMD_transform_feedback4 is supported then GEOMETRY_SHADER_PRIMITIVES_-
> EMITTED_ARB counts primitives emitted to any of the vertex streams for
> which STREAM_RASTERIZATION_AMD is enabled.
- Remove comment from GL3.txt because it is only used for extensions that are
part of required versions (Ilia)
- Move the new tokens to a new XML doc instead of using the main GL4x.xml (Ilia)
- Add a fallthrough comment (Ilia)
- Only divide PS invocations by 4 on HSW+ (Ben)
v3:
- Add ARB_pipeline_statistics_query to relnotes.html
- Add ARB_pipeline_statistics_query.xml to the Makefile.am, and master XML (Ilia)
- Correct extension number (Ilia)
- Add link to xml in the main GL API xml (Ilia)
- remove special GS case from gen6_end_query (Ian)
- Make lookup table static so gcc doesn't initialized it on every call (Ian)
- Use if (_mesa_has_geometry_shaders(ctx)) instead of explicit checks (Ian)
- Core mesa parts moved into a prep patch (Ilia)
v4:
- Change to 10.6 relnotes since we missed 10.5 window
- Moved compute shader stuff into the switch statement (Jordan)
- Jordan: Add compute shader support
v5:
- Fixed relnote style (Ilia)
v6:
- Rebased on master which beat me to adding the first relnotes - essentially
this undoes v5 (which had a typo anyway)
- Some code style fixes (Ken)
- Remove some excess comments (Ken)
- Unify tessellation failure style - unreachable (Ken)
- Fix workaround comment for PS invocations (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was originally part of a single patch which added the extension, and
implemented it for i965 classic. For information about the evolution of the
patch, please see the subsequent commit.
One difference here as compared to the original mega patch is this does build
support for the compute shader query. Since it cannot be tested on any platform,
it will always return NULL for now. Jordan has already written a patch to
address this, and when that patch lands, this logic can be modified.
v2: Fix typo in subject (Brian Paul)
Add checks for desktop gl (Ilia)
Fail for any callers for now (Ilia)
Update QueryCounterBits for new tokens (Ilia)
Jordan: Use _mesa_has_compute_shaders
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v3: Rebased on patch which adds the proper information to unstub tessellation
shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's some debate about whether we should use Meta or BLORP,
but either should run circles around the BLT engine.
In particular, this means that Gen8+ will use the 3D engine for blits,
like we do on Gen6-7.
Improves performance in "copypixrate -blit -back" (from Mesa demos)
by 232.037% +/- 3.15795% (n=10) on Broadwell GT3e.
v2: Rebase on Laura's changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Previously we didn't emit MAD instructions since they cannot take
immediate arguments, but with the opt_combine_constants() pass we can
handle this properly.
total instructions in shared programs: 5920017 -> 5733278 (-3.15%)
instructions in affected programs: 3625153 -> 3438414 (-5.15%)
helped: 22017
HURT: 870
GAINED: 91
LOST: 49
Without constant pooling, this patch is a complete loss:
total instructions in shared programs: 5912589 -> 5987888 (1.27%)
instructions in affected programs: 3190050 -> 3265349 (2.36%)
helped: 1564
HURT: 17827
GAINED: 27
LOST: 101
And since the constant pooling patch by itself hurt a bunch of things,
from before constant pooling to this patch the results are:
total instructions in shared programs: 5895414 -> 5747946 (-2.50%)
instructions in affected programs: 3617993 -> 3470525 (-4.08%)
helped: 20478
HURT: 4469
GAINED: 54
LOST: 146
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And then the opt_combine_constants() pass will pull them out into
registers. This will allow us to do some algebraic optimizations on MAD
and LRP.
total instructions in shared programs: 5946656 -> 5931320 (-0.26%)
instructions in affected programs: 778247 -> 762911 (-1.97%)
helped: 3780
HURT: 6
GAINED: 12
LOST: 12
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The fs_visitor's dump_instruction() implementation calls cfg_t()
indirectly through calculate_live_intervals, so if you have an infinite
loop in the CFG code, you can't call cfg::dump(fs_visitor *) to debug
it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To insert an instruction at the end of a basic block, we typically do
something like
inst = block->last_non_control_flow_inst();
inst->insert_after(block, new_inst);
But blocks can consist of a single control flow instruction, so inst
will actually be the exec_list's head sentinel. We shouldn't use it as
if it were a regular instruction, but it is safe to insert something after
it.
This patch avoids assert-failing because an exec_list sentinel wasn't in
the basic block's instruction list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The macro is defined to provide a trailing ; so this caused the expansion
to end in ";;" which made the Solaris Studio compilers issue warnings for
every line of:
"builtin_type_macros.h", line 113: Warning: extra ";" ignored.
for every file that included the header, filling build logs with thousands
of useless warnings.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
opt_copy_propagation and opt_copy_propagation_elements create new ACP
and Kill sets each time they enter a new control flow block. For if
blocks, they also copy the entire existing ACP set contents into the
new set.
When we exit the control flow block, we discard the new sets. However,
we weren't freeing them - so they lived on until the pass finished.
This can waste a lot of memory (57MB on one pessimal shader).
This patch makes the pass allocate ACP entries using this->acp as the
memory context, and Kill entries out of this->kill. It also steals
kill entries when moving them from the inner kill list to the parent.
It then frees the lists, including their contents.
v2: Move ralloc_free(this->acp) just before this->acp = orig_acp
(suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.5 10.4" <mesa-stable@lists.freedesktop.org>
When we schedule an instructions with undefined value, we
eventually will use 0, which is a constant, however sb wasn't
taking this into account and creating ops with illegal scalar
swizzles.
this replaces my fix for op3 in t slots.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Matt Turner noticed that the hardware has always had a MIN
instruction, but the driver always used MAX+MOV for no
apparent reason.
This should cut an instruction, and a temporary, allowing
more programs to run in hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt Turner noticed that the hardware has always had a MIN
instruction, but the driver always used MAX+MOV for no
apparent reason.
This should cut an instruction, and a temporary, allowing
more programs to run in hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I used this a while back when debugging GPU hangs, and it seems like it
could be useful, so I figured I'd add it so people can use it in the
debugger.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sandybridge requires the post-sync non-zero workaround in a ton of
places, and if you ever miss one, the GPU usually hangs.
Currently, we try to track exactly when a workaround flush is
necessary (via the brw->batch.need_workaround_flush flag). This is
tricky to get right, and we've botched it several times in the past.
This patch unconditionally performs the post-sync non-zero flush at the
start of each primitive's state upload (including BLORP). We drop the
needs_workaround_flush flag, and drop all the other callers, as the
flush has already been performed.
We have no data to indicate that simply flushing all the time will
hurt performance, and it has the potential to help stability.
v2: Add post-sync workaround to initial GPU state upload to be extra
cautious (suggested by Chad Versace).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously array textures were not working with GetCompressedTextureImage,
leading to failures in the test
arb_direct_state_access/getcompressedtextureimage.c.
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.4, 10.5" <mesa-stable@lists.freedesktop.org>
Just remove the _mesa_free_lighting_data function. The body has been
empty since the shine table was moved into the tnl module (commit
ba1d921).
main/light.c:1216:46: warning: unused parameter 'ctx' [-Wunused-parameter]
_mesa_free_lighting_data( struct gl_context *ctx )
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
delete_management.c:56:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
delete_management.c:69:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = size - 100; i < size; i++) {
^
delete_management.c:79:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(key_value(entry->key) >= size - 100 &&
^
delete_management.c:79:70: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(key_value(entry->key) >= size - 100 &&
^
insert_many.c:56:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
insert_many.c:62:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
insert_many.c:67:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
assert(ht->entries == size);
^
random_entry.c:62:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size; i++) {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
glcpp/glcpp.c:124:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
const static struct option
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
+ minor indentation fixes
Discovered by Axel Davy.
This can't be reproduced with any app, because all state trackers set a DSA
state first.
Cc: 10.5 10.4 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
It's not possible to query the current buffer binding, because the extension
doesn't define GL_..._BUFFER__BINDING_AMD.
Drivers should check the target parameter of Drivers.BufferData. If it's
equal to GL_EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD, the memory should be pinned.
That's all there is to it.
A piglit test is on the piglit mailing list.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
everytime I open this file in emacs with show trailing whitespace
or git add from it my screen flares with red.
Just do a general cleanup, makes working on fp64 support not as
jarring.
I'm not saying this is perfect, its just better than before.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The short version: we need to set bits in R0.7 which provide a mask to be used
for PS kill samples/pixels. Since the VS has no such concept, we just need to
set all 1.
The longer version...
Execution for SIMD8 atomics is defined as follows:
SIMD8: The low 8 bits of the execution mask are ANDed with 8 bits of the
Pixel/Sample Mask from the message header. For the typed messages, the Slot
Group in the message descriptor selects either the low or high 8 bits. For the
untyped messages, the low 8 bits are always selected. The resulting mask is used
to determine which slots are read into the destination GRF register (for read),
or which slots are written to the surface (for write). If the header is not
present, only the low 8 bits of the execution mask are used.
The message header for untyped messages is defined in R0.7 "This field contains
the 16-bit pixel/sample mask to be used for SIMD16 and SIMD8 messages. All 16
bits are used for SIMD16 messages. For typed SIMD8 messages, Slot Group selects
which 8 bits of this field are used. For untyped SIMD8 messages, the low 8 bits
of this field are used." Furthermore, "The message header for the untyped
messages only needs to be delivered for pixel shader threads, where the
execution mask may indicate pixels/samples that are enabled only due to
derivative (LOD) calculations, but the corresponding slot on the surface must
not be accessed." We're not using a pixel shader here, but AFAICT, this mask is
used for all stages.
This leaves two options, Remove the header, or make the VS code emit the correct
thing for the header. I believe one of the goals of using SIMD8 VS was to get as
much code reuse as possible, and so I chose the latter. Since the VS has no such
thing as kill instructions, the mask is derived simple as all 1's.
v2:
Add a comment to the code (stolen from Curro on the mailing list)
Change the control flow style (Curro + Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87258
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When restoring the current state in _mesa_meta_end it was previously trying to
copy the on-going sample count of the current occlusion query into the new
query after restarting it so that the driver will continue adding to the
previous value. This wouldn't work for two reasons. Firstly, the query might
not be ready yet so the Result member will usually be zero. Secondly the saved
query is stored as a pointer to the query object, not a copy of the struct, so
it is actually restarting the exact same object. Copying the result value is
just copying between identical addresses with no effect. The call to
_mesa_BeginQuery will have always reset it back to zero.
This patch fixes it by making it actually wait for the query object to be
ready before grabbing the previous result. The downside of doing this is that
it could introduce a stall but I think this situation is unlikely so it might
not matter too much. A better solution might be to introduce a real
suspend/resume mechanism to the driver interface. This could be implemented in
the i965 driver by saving the depth count multiple times like it does in the
i945 driver.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88248
Reviewed-by: Carl Worth <cworth@cworth.org>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
This line was removed by accident in commit
16b9112574 causing a regression in the
ES3-CTS.gtf.GL3Tests.shadow.shadow_execution_vert Khronos conformance
test. It's necessary because the swizzle_result() code below expects
all four components of the vector to be valid.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89094
Tested-by: Lu Hua <huax.lu@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need to swizzle the rhs to match the number of components in the writemask,
otherwise we'll hit an assertion in ir_assignment.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some old format conversion code in pack.c implemented byte-swapping like this:
GLint comps = _mesa_components_in_format(dstFormat);
GLint swapSize = _mesa_sizeof_packed_type(dstType);
if (swapSize == 2)
_mesa_swap2((GLushort *) dstAddr, n * comps);
else if (swapSize == 4)
_mesa_swap4((GLuint *) dstAddr, n * comps);
where n is the pixel count. But this is incorrect for packed formats,
where _mesa_sizeof_packed_type is already returning the size of a pixel
instead of the size of a single component, so multiplying this by the
number of components in the format results in a larger element count
for _mesa_swap than we want.
Unfortunately, we followed the same implementation for byte-swapping
in the rewrite of the format conversion code for texstore, readpixels
and texgetimage.
This patch computes the correct element counts for _mesa_swap calls
by computing the bytes per pixel in the image and dividing that by the
swap size to obtain the number of swaps required per pixel. Then multiplies
that by the number of pixels in the image to obtain the swap count that
we need to use.
Also, when handling byte-swapping in texstore_rgba, we were ignoring
the image's depth. This patch fixes this too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
... instead of emit(BRW_OPCODE_CMP, ...). In commit 6b3a301f I changed
vec4_visitor::CMP to set the destination's type to that of src0. In the
following commit (2335153f) I removed an apparently now unnecessary work
around for Gen8 that did the same thing.
But there was a single place that emitted a CMP instruction without
using the vec4_visitor::CMP function. Use it there.
And change dst_null_d to dst_null_f for good measure, since ARB vp
doesn't have integers.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89032
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
3DSTATE_CC_STATE_POINTERS seems to be ignored when bit 0 of DW1 is not set.
Follow i965 and set the bit for 3DSTATE_CC_STATE_POINTERS and
3DSTATE_BLEND_STATE_POINTERS. Add gen checks for all state pointer commands.
If a transform feedback buffer's size is 0, st_bufferobj_data doesn't
end up creating a buffer for it. There's no point in trying to write to
such a buffer, so just pretend as if it's not really there.
This fixes arb_gpu_shader5-xfb-streams-without-invocations on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
GLSL IR labels gl_FrontFacing as an input variable and not a system value.
This commit makes NIR silently translate gl_FrontFacing to a system value
so that it properly gets translated into a load_system_value intrinsic.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
To help identify llvmpipe rasterizer threads -- especially when there
can be so many.
We can eventually generalize this to other OSes, but for that we must
restrict the function to be called from the current thread. See also
http://stackoverflow.com/a/7989973
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Current implementation allowed usage of unsized type texture GL_FLOAT
and GL_HALF_FLOAT as a render target as this was 'expected behavior' by
WEBGL_oes_texture_float and is also allowed by the oes-texture-float
WebGL test. However this broke some ES3 conformance tests that do not
accept such behavior. Patch sets such an fbo incomplete as expected by
the ES3 conformance tests. Textures with sized types like RGBA32F will
still continue to work as render targets.
v2: code style cleanups (Ian Romanick, Matt Turner)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88905
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Right now the places that used to emit a mov.sf just put the SF on the
previous instruction when it generated the source of the SF value. Even
without optimization to push the sf up further (and kill thus potentially
kill more MOVs), this gets us:
total uniforms in shared programs: 13455 -> 13457 (0.01%)
uniforms in affected programs: 3 -> 5 (66.67%)
total instructions in shared programs: 40296 -> 40198 (-0.24%)
instructions in affected programs: 12595 -> 12497 (-0.78%)
The compiler can't tell that we're always going to hit the first if block
on the first time through the loop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If execution was supposed to be supported in this case, we'd run into
trouble from completely uninitialized sat_imm values.
v2: Drop the '!' before the string.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We always pass this argument, even if it won't be used by the particular
texture op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit f82f2fb3dc added use of the Mesa
IR optimizer for both ARB_fragment_program and ARB_vertex_program, but
only justified the vertex-program portions with measured performance
improvements.
Meanwhile, the optimizer was seen to generate hundreds of unused
immediates without discarding them, causing failures.
Discard the use of the optimizer for now to fix the regression. (In
the future, we anticpate things moving from Mesa IR to NIR for better
optimization anyway.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82477
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CC: "10.3 10.4 10.5" <mesa-stable@lists.freedesktop.org>
We need to build certain parts of Mesa (namely gallium, llvmpipe, and
therefore util) with Windows SDK 7.0.7600, which includes MSVC 2008.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To fix build when libdrm is not found,
commit a594cec7e3 did put several
parts of egl code under #ifdef HAVE_DRM_PLATFORM.
HAVE_DRM_PLATFORM means the egl drm platform is being built.
What should have been used instead is HAVE_LIBDRM.
At a few locations, the HAVE_DRM_PLATFORM introduced
have already been replaced by HAVE_LIBDRM, this patch
replaces the remaining occurences.
This patch makes for example EGL_EXT_image_dma_buf_import
be advertised by egl under x11 when the drm egl platform
is not built, whereas previously it required the drm egl
platform to be built.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
With commit c642e87d9f4(auxiliary/vl: rework the build of the VL code)
we split out the VL code into a separate static library that was meant
to be used by the VL targets alone - va, vdpau, xvmc.
The commit failed to consider the way we handle vdpau-gl interop and
broke it. Bring back the functionality by keeping the vl <> vl_stub
separation as requrested by Christian.
v2: Update the omx target as well. Update mesa-stable email address.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86837
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Currently having the wayland-scanner is optional, which causes problems
when autotools parses through the makefiles, and tries to generate all
the BUILT_SOURCES.
As the config option --with-egl-platform=wayland is not the default, we
won't end up setting the WAYLAND_SCANNER variable, which in turn will
cause some files to not get generated.
There has been a wayland-scanner package as of wayland 1.2 which
provides a variable for the scanner binary, so let's use that one and
fall back to manually searching via AC_PATH_PROG when needed.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use nir/nir_opcodes.h as is (w/o the absolute path), as it is the target
name used to generate the actual file. Otherwise the target is missing,
the file won't get generated and the build will fail.
Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're using a SIMD4x2 sampler message, which has execsize 4, and so the
register width must be <= 4. Use <4,4,1> regioning instead of <8,8,1>
regioning to access the same data but avoid tripping the assert.
Fixes the following piglit tests:
spec/glsl-1.20/compiler/structure-and-array-operations/array-selection.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-basic.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-struct.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-function.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-array.vert
glslparsertest/glsl2/condition-07.vert
spec/glsl-es-3.00/compiler/uniform_block/interface-name-field-clashes-with-variable.vert
v2: Better commit message courtesy of Ken.
I had a discussion with Ken, and we both question how we end up with a mov and
execsize 4. For now though, this fixes the piglit tests, so we can worry about
it later.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Accumulated changes for various renames and additions, including Gen8
definitions. Some of the dynamic state __SIZE no longer means the size of an
element, but the size of an array of elements. The changes can be seen in
ilo_render_dynamic.c.
LINTERP is implemented as a PLN instruction or a LINE+MAC. PLN and MAC
can do conditional mod. CINTERP is just a MOV.
total instructions in shared programs: 5952103 -> 5950284 (-0.03%)
instructions in affected programs: 324573 -> 322754 (-0.56%)
helped: 1819
We lose the SIMD16 in one Unigine Heaven shader which appears six times
in shader-db.
Dead since
commit 284ce20901
Author: Eric Anholt <eric@anholt.net>
Date: Fri Aug 20 10:52:14 2010 -0700
Remove remnants of the old glsl compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And unfortunately other shaders do the same thing but with >=/<= which
we can't apply this optimization to because of NaNs.
instructions in affected programs: 23309 -> 22938 (-1.59%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No change on shader-db on i965.
v2: Reword the comment due to feedback from Erik Faye-Lund
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
We want the size of a float per component, not the size of a whole vec4.
NIR instructions on i965:
total instructions in shared programs: 1261937 -> 1261929 (-0.00%)
instructions in affected programs: 114 -> 106 (-7.02%)
Looking at one of these examples (tesseract), it's from vec4 load_consts
for a MRT solid fill, which do get CSEed now that we don't memcmp off the
end of the const value and into the SSA def. For the 1-component loads
that are common in i965, we were only memcmping off into the rest of the
usually zero-filled const_value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
xfont.c:237:14: error: implicit declaration of function 'GetGLXDRIDrawable' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
glxdraw = GetGLXDRIDrawable(CC->currentDpy, CC->currentDrawable);
^
Fixes regression from 291be28476
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Lots of shaders divide by exp2(...) which we turn into a multiplication
by the reciprocal. We can avoid the reciprocal by simply negating exp2's
argument.
total instructions in shared programs: 5947154 -> 5946695 (-0.01%)
instructions in affected programs: 118661 -> 118202 (-0.39%)
helped: 380
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Same as commit 3654b6d4 to the fs backend.
total instructions in shared programs: 5945788 -> 5945787 (-0.00%)
instructions in affected programs: 36 -> 35 (-2.78%)
helped: 1
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Same as commit c4fab711 to the fs backend.
total instructions in shared programs: 5945998 -> 5945788 (-0.00%)
instructions in affected programs: 74665 -> 74455 (-0.28%)
helped: 399
HURT: 180
It hurts some programs because we make no attempts in the vec4 backend
to avoid MADs if they have constant (or vector uniform) arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Skylake+ doesn't support setting a depth buffer to a 1D surface but it
does allow pretending it's a 2D texture with a height of 1 instead.
This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit
test (and also seems to avoid a subsequent GPU hang).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Null surfaces are going to be useful to have something to point
unbound image units to, as the ARB_shader_image_load_store extension
requires us to behave deterministically in cases where some shader
tries to access an unbound image unit: Invalid stores and atomics are
supposed to be discarded and invalid loads are supposed to return
zero, which is precisely what the null surface does.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't really improve locality of texture fetches, quite the
opposite it's a waste of memory bandwidth and space due to tile
alignment.
v2: Check mt->logical_height0 instead of mt->target (Ken). Add short
comment explaining why they shouldn't be tiled.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Negation of UD/UW sources behaves the same as for D/W sources, taking
the two's complement of the source, except for bitwise logical
operations on Gen8 and up which take the one's complement. Fixes
crash in a GLSL shader with subtraction of two unsigned values.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Scalar registers are required to have zero stride, fix the
regs_written calculation not to assume that the instruction writes
zero registers in that case.
v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using 'ralloc*(this, ...)' is wrong if the object has automatic
storage or was allocated through any other means. Use normal dynamic
memory instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason why you need a vec4_visitor to construct a
vec4_instruction is to initialize vec4_instruction::ir and
::annotation. Instead set them from vec4_visitor::emit() just like
fs_visitor does.
Reviewed-by: Matt Turner <mattst88@gmail.com>
One should be able to manipulate i965 IR without pulling the whole
FS/VEC4 visitor classes -- Optimization passes and other
transformations would ideally be visitor-agnostic. Among other issues
this avoids a circular dependency between the header file where such
visitor-agnostic code will be defined and the main FS/VEC4 header
where both IR (layer below) and visitor (layer above) happen to be
defined.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now virtual GRF book-keeping and allocation is performed in each
visitor class separately (among other hundred different things),
leading to duplicated logic in each visitor and preventing layering as
it forces any code that manipulates i965 IR and needs to allocate
virtual registers to depend on the specific visitor that happens to be
used to translate from GLSL IR.
v2: Use realloc()/free() to allocate VGRF book-keeping arrays (Connor).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cubemap array images are unlike cubemap array samplers in that they don't need
an additional coordinate to index individual cubemaps in the array, instead
they behave like a 2D array of 6n layers, with n the number of cubemaps in the
array. Take this exception into account.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some people have complained that code using the CEILING() macro is
difficult to understand because it's not immediately obvious what it
is supposed to do until you go and look up its definition. Use a more
descriptive name that matches the similar utility macro in the Linux
kernel.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Without this when an application issues that query, it would try to
wait the result from the gpu, and since no query has been actually
issued, it will wait forever.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add a specific optimisation pass for NV50 to check whether SRC0 or SRC1 is
a MOV dst, IMM. If so: fold the IMM in and try to drop the MOV. Must be
done post-RA because it requires that SDST == SSRC2.
V2: improve readability and add comments to clarify decisions
V3: Remove redundant code... compiler already attempts to put the IMM in
SSRC1
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
But don't enable generation of it in the opProperties, because we can't
guarantee the SDST==SRC2 constraint until after register assignment. We'll
add a post-RA folding pass to utilise this.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add emission rules for negative and saturate flags for MAD 4-byte opcodes,
and get rid of some of the constraints. Obviously tested with a wide variety
of shaders.
V2: Document MAD as supported short form
V3: Split up IMM from short-form modifiers
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The old way made it impossible for the optimizer to reason about what
was going on. The new way is the same number of instructions (the neg
gets folded into the cvt) but enables the optimizer to be cleverer if
comparing to a constant (most common case). [The optimizer is presently
not sufficiently clever to work this out, but it could relatively easily
be made to be. The old way would have required significant complexity to
work out.]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Printing instructions doesn't modify them, so we can mark the parameter
const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The hardware's integer luminance formats are completely unusable;
currently we fall back to RGBA. This means we need to override
the texture swizzle to obtain the XXX1 values expected for luminance
formats.
Fixes spec/EXT_texture_integer/texwrap formats bordercolor [swizzled]
on Broadwell - 100% of border color tests now pass on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
This provides for atomic addition, which will be used by an upcoming
shader-cache patch. A simple test is added to "make check" as well.
Note: The various O/S functions differ on whether they return the
original value or the value after the addition, so I did not provide
an add_return() macro which would be sensitive to that difference.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, the set_insert function would bail early if it found a deleted
slot that it could re-use. However, this is a problem if the key being
inserted is already in the set but further down the list. If this happens,
the element ends up getting inserted in the set twice. This commit makes
it so that we walk over all of the possible entries for the given key and
then, if we don't find the key, place it in the available free entry we
found.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the hash_table_insert function would bail early if it found a
deleted slot that it could re-use. However, this is a problem if the key
being inserted is already in the hash table but further down the list. If
this happens, the element ends up getting inserted in the hash table twice.
This commit makes it so that we walk over all of the possible entries for
the given key and then, if we don't find the key, place it in the available
free entry we found.
Reviewed-by: Eric Anholt <eric@anholt.net>
For framebuffer completeness checks, consider renderbuffers as not
layered. Previously, they would have counted as layered if a layered
textured had previously been bound to the same attachment point. This
could cause framebuffer completeness checks to incorrectly fail with
GL_FRAMEBUFFER_INCOMPLETE_LAYER_TARGETS, even if no layered attachments
were present.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89026
An upcoming patch is going to introduce some code here, and having this code
organized as the patch does makes it a bit easier to read later.
There should be no functional change here.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
As it turns out, we were over-thinking the cause of the hang on
Cherryview. It's simply errata for Cherryview.
commit 88fea85f09
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Nov 21 10:47:41 2014 -0800
i965/vec4/gen8: Handle the MUL dest hazard exception
This is an explanation to why we never saw the hang on BDW.
NOTE: The problem the original patch was trying to fix does still exist. It will
have to be fixed at some point.
v2: Modify commit message, s/CHV/BDW
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This expects (0,0,0,0), though it can be changed to something else or allow
more than one set of values to be considered correct.
This is currently the radeonsi behavior.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Since alu does not support abs() modifier on source operands, spill
and apply the modifiers to a temp register when needed.
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were incorrectly attributing VS time to FS8 on Gen8+, which now use
fs_visitor for vertex shaders.
We don't hit this for geometry shaders yet, but we may as well add
support now - the fix is obvious, and we'll just forget later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we special cased FB writes and URB writes in the register
allocation code. What we really wanted was to handle any message with
EOT set.
This saves us from extending the list with new opcodes in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This helper function basically just checks inst->eot, but also asserts
that only opcodes we expect to terminate threads have EOT set. As far
as I'm aware, we've never had such a bug.
Removing it means that we don't have to extend the list for new opcodes.
Cherryview and Skylake introduce an optimization where sampler messages
can have EOT set; scalar GS/HS/DS will likely introduce new opcodes as
well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The latter currently implies CPU read access, so only PIPE_USAGE_STAGING
can be expected to be fast.
Mesa demos src/tests/streaming_rect on Kaveri (radeonsi):
Unpatched: 42 frames in 1.023 seconds = 41.056 FPS
Patched: 615 frames in 1.000 seconds = 615.000 FPS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88658
Cc: "10.3 10.4" <mesa-stable@lists.freedestkop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use a dummy vertex buffer object when vs inputs have no corresponding
entries in the vertex declaration. This dummy buffer will give to the
shader float4(0,0,0,0).
This fixes several artifacts on some games.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
The drm fd wasn't released, causing a crash
for wine tests on nouveau, which seems to have
a bug when a lot of device descriptors are open.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We weren't releasing hal and ref, causing some issues (threads not released, etc)
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When on a render node the unique ioctl doesn't work.
This patch drops the code to detect the device, which relied
on an ioctl, and replaces it by the mesa loader function.
The mesa loader function is more complete and won't fail for render-nodes.
Alternatively we could also have used the pipe cap to
determine the vendor and device id from the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This seems to be the behaviour on Win. Previous behaviour led
to different issues depending on the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
If has_present_buffers was false at first,
but after a device reset, it turns true (for
example if we begin to render to a multisampled
back buffer), there was a crash due to present_buffers
being uninitialised.
This patch fixes it.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous code wasn't checking against the correct limit: 224
for sm3 hardware, but 256.
Fixes wine test test_pixel_shader_constant()
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Wine tests, and probably some apps, check for errors by checking for NULL
instead of error codes.
Fixes wine test test_surface_blocks()
Reviewed-by: Axel davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Add returncode E_FAIL.
Return E_FAIL for any vertexdeclaration element with type unused.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
ATOC is an hack for Alpha to coverage
that is supported by NV and Intel.
You need to check the support for it
with CheckDeviceFormat.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack is supposed to be supported
by all AMD SM2+ cards. Apps use it without
checking if they are on AMD.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This depth buffer format, like D3DFMT_INTZ, can be used to read
the depth buffer values when bound to a shader.
Some apps may use this format to get better performance when
they don't need the precision of INTZ (24 bits for depth, 8 for
stencil, whereas DF16 is just 16 bits for depth)
We don't add support for DF24 yet, because it implies support
for FETCH4, which we don't support for now.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some drivers support PIPE_FORMAT_S8_UINT_Z24_UNORM,
some others PIPE_FORMAT_Z24_UNORM_S8_UINT, some both.
It doesn't matter which one we use, since the d3d formats
they map to aren't lockable (app can read it directly).
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Move the checks of whether the format is supported
into a common place.
The advantage is that allows to handle when a d3d9
format can be mapped to several formats, and that
cards don't support all of them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The order of the format is changed to have
an increasing ordering of the d3d9 format values.
Some missing formats are added and matched to PIPE_FORMAT_NONE
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Because the debug output of this function was cut in two parts,
sometimes the second part wasn't print when we would return earlier,
whereas we would like to get it.
The reason of the separation was that it's only at the end of the function
we can print what we map to the d3d9 arguments, but we can always retrieve
that info by hand.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack allows to resolve a multisampled
depth buffer into a single sampled one.
Note that the implementation is slightly incorrect.
When querying the content of D3DRS_POINTSIZE,
it should return the resz code if it has been set.
This behaviour will be implemented when state changes
will be reworked. For now the current behaviour is ok,
since apps use the D3DCREATE_PUREDEVICE flag when creating
the device, which means they won't read states and in exchange
get better performance.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This fixes a wine test and some minor visual issues on some games.
The patch is not optimal, there is probably a more efficient way to
fix this issue, but the code there already has some innefficiencies.
There is plans to rewrite that part of the code to make it more
efficient.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
D3DSP_NOSWIZZLE already contains the shift.
Detected with Clang.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This removes unneeded hack for Anno 1404.
This app is not checking the number of supporting
constants, and rely on the shader compilation to fail
if it puts too many constants.
This patch also checks for the correct number of constants for ps.
Note that we don't check the official limitations for old vs and ps
versions. The restrictions were fixed, unlike for the number of vertex
shader constants for later versions. Likely apps use the correct number,
and it's not a problem for us if it wants use more.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Instead of crashing on buggy shaders, we should return an error.
This patch introduces this behaviour in the case of invalid constant
access
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
r500 hasn't enough float constants for vs to fill all needs.
Overlapping issues can happen with complex shaders.
The fix would be to recompile shaders to include the integer
and boolean constants, instead of reserving slots for them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previously 276 constants were declared everytime.
This patch makes shaders declare constants up to the maximum
constant needed and moves the moment we print the TGSI
shader after the moment we declare the constants.
This is needed for r500, since when indirect addressing is used,
it cannot reduce the amount of constants needed, and that it is
restricted to 256 constant slots.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Count explicitly the slots for float, int and bool constants,
and deduce the constbuf size in nine_shader.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch raises nine requirements and disables nine for old
hw that don't match them.
Currently for these cards only games that don't have tight requirements
would work well with nine. However nine is missing several checks
regarding these limitations.
To make code and future patches less heavy, dropping support for these old
card seems a good solution.
That makes r500 the only dx9 generation cards supported by nine. It seems the one
with the less limitations for nine. Still not everything is ok, and we'll have
for example to implement shader recompilation for these cards to include
integer and boolean constants in the shader.
Eventually when this is done, we can reintroduce support for older cards.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Resolving a multisampled depth texture into
a single sampled texture is supported on >= SM4.1
hw. It is possible some previous hw support it.
The ability was tested on radeonsi and nvc0.
Apparently is is also supported for radeon >= r700.
This patch adds the MULTISAMPLE_Z_RESOLVE cap and
add it to the drivers. It is advertised for drivers
for which it is sure the ability is supported.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Khronos modified glext.h to get rid of GL_TEXTURE_BINDING, a special enum
added for ARB_direct_state_access. This enum was ruled unimplementable.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Nothing special needs to be done.
Even though llvmpipe copies constant (ie uniform) buffers internally, the
application is supposed to flush and sync, so all should work.
All bufferstorage piglit tests pass.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As ffsll doesn't exist in MSVC yet, and u_bit_scan64 is only used by
radeonsi which is never built with MSVC.
This is just a stop-gap fix to unbreak MSVC build until we refactor these
mathematical portability wrappers into src/util.
Trivial.
We need a slot for the stipple texture and the pixel shader already uses
32 textures (16 API slots + 16 FMASK slots).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The hardware obeys swizzles even if the resource is NULL.
This will be used by set_polygon_stipple.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it.
Instead, use offset = 0, which is what we always do when not appending.
This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.
Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For drivers that use higher slots not to crash in tgsi_shader_info.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a rebase swizzle is provided and we call _mesa_swizzle_and_convert
after unpacking the source format we were always passing normalized=false.
We should pass true or false depending on the formats involved in the
conversion for the byte and float paths (the integer path cannot ever be
normalized).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Originally, get_alu_src was supposed to handle resolving swizzles and
things like that. However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore. The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form. We can handle those cases as a special case
easily enough. As a side-effect, we don't need the vec_to_movs pass
anymore.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we detect if we need an extra copy for swizzling. The
old code involved a pile of confusing switch fall-throughs; we now use a
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we can scalarize with NIR, there's no need for all this code
anymore. Let's get rid of it and just do scalar operations.
v2: run copy prop before lowering phi nodes
v3: Get rid of the "emit(...)->saturate = foo" pattern
v4: Run alu_to_scalar as an optimization pass
total instructions in shared programs: 5998321 -> 5974070 (-0.40%)
instructions in affected programs: 732075 -> 707824 (-3.31%)
helped: 3137
HURT: 191
GAINED: 18
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All but 16 of the programs helped were ARB fp programs.
total instructions in shared programs: 5949286 -> 5945470 (-0.06%)
instructions in affected programs: 275162 -> 271346 (-1.39%)
helped: 1197
GAINED: 1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:
"If a negative number is provided where an argument of type sizei or
sizeiptr is specified, an INVALID_VALUE error is generated.
This patch adds checks for negative buffer size values passed to different APIs.
It also moves up the check on other APIs that already had it, making it the first
error check performed in the function, for consistency.
While there may be other APIs throughtout the code lacking this check (or at least
not at the beginning of the function), this patch focuses on the cases that break
the dEQP tests listed below. It could be a good excersize for the future to check
all other cases, and improve consistency in the order of the checks throughout the
whole Mesa code base.
This fixes 5 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_attached_shaders
* dEQP-GLES3.functional.negative_api.state.get_shader_source
* dEQP-GLES3.functional.negative_api.state.get_active_uniform
* dEQP-GLES3.functional.negative_api.state.get_active_attrib
* dEQP-GLES3.functional.negative_api.shader.program_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec:
"If the default framebuffer is bound to target, then attachment must be
BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or
STENCIL, identifying the stencil buffer."
OpenGL ES 3.0, section 2.5 (GL Errors):
"If a command that requires an enumerated value is passed a
symbolic constant that is not one of those specified as allowable
for that command, an INVALID_ENUM error is generated."
Then change the returned error to INVALID_ENUM.
Fixes:
dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750) 0.0000000447
mod(121.57, 13.29) 0.0000023842
mod(3769.12, 321.99) 0.0000762939
mod(3769.12, 1321.99) 0.0001220703
mod(-987654.125, 123456.984375) 0.0160663128
mod( 987654.125, 123456.984375) 0.0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX
(2.8.1 Transferring Array Elements, page 26) which is not currently
possible to query using glGet*() funcs.
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For code such as:
uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);
We produce code like:
mov(8) g5<1>.xF -g9<4,4,1>.xUD
which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.
It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.
This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.
Fixes the following 24 dEQP tests:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing enables the extension yet, but the values are now available.
The spec calls for it to only be exposed for GL 3.3+, which is core-only
in mesa. Instead we allow any driver to enable it, including in a compat
context for any GL version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Paul originally had to reverse engineer these formulas based on the
description about how the sampler works. The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.
Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels. If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user. Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations. After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.
This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment. On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.
The solution is to add a new _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.
The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).
The gears demo and others hit this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The intrinsics are universally available, whereas older Windows SDKs (e.g.
7.0.7600) don't have the non-intrisic entrypoint.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.
This fixes the fbo-blending-formats Piglit test and possibly others.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When color buffers alone are concerned the depth is not needed.
No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.
This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.
v4: take assert away and make valid_filter_for_float conditional (Tapani),
fix the alphabetical order (Emil)
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list(). This is
apparently relied on by r300g.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function. So, for now, we'll
just disable it until we actually have a C++ user.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
[ Francisco Jerez: As discussed on the mailing list, this is intended
to produce more useful debug output in cases where the compilation
terminates unexpectedly. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[ Francisco Jerez: As we're at it make debug_options[] local to its
only user and remove temporary. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.
On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.
This is a first pass attempt at this, and passes the piglit
tests that test for this.
v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.
On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Refactor to make the functions look more like the old
intel_tex_subimage_tiled_memcpy
- Don't export the readpixels_tiled_memcpy function
- Fix some pointer arithmatic bugs in partial image downloads (using
ReadPixels with a non-zero x or y offset)
- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
miplevel other than zero.
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Better documentation fot the *_tiled_memcpy functions
- Add target restrictions for renderbuffers wrapping textures
v4: Jason Ekstrand <jason.ekstrand@intel.com>
- Only check the return value of brw_bo_map for error and not bo->virtual
v5: Jason Ekstrand <jason.ekstrand@intel.com>
- Don't unnecessarily repeat a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit addes tiled copy functions for coping from tiled memory to
linear memory. These are very similar to the existing linear-to-tiled
paths.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- New commit message
- Various whitespace fixes
- Added ptrdiff_t casts as done in commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.
Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value). However, we wouldn't
delete CMP.nz instructions that served the same purpose.
We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.
This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.
No changes in shader-db without NIR. With NIR,
total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs: 2087139 -> 2050350 (-1.76%)
helped: 10704
HURT: 0
GAINED: 2
LOST: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
GAINED: 103
LOST: 43
Causes one program to spill (643 -> 1076 instructions).
I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.
No piglit regressions on any i965 platform in Jenkins.
total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0
No other platforms affected in shader-db.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Most of these exist in the GLSL IR algebraic pass already. However,
SSA allows us to find more instances of the patterns.
total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs: 124189 -> 120026 (-3.35%)
helped: 604
total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs: 261295 -> 254507 (-2.60%)
helped: 1295
HURT: 3
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.
The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.
total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs: 411143 -> 405922 (-1.27%)
helped: 2233
HURT: 214
A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x. We can always clean that up later.
total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs: 784980 -> 775093 (-1.26%)
helped: 4508
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GLSL IR optimization pass contained these; we may as well include
them too.
v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).
No change in the number of NIR instructions on a shader-db run.
total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that. While there,
I decided to add a bunch more of them.
v2: Delete bogus fand/for optimizations (caught by Jason).
total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs: 149634 -> 146937 (-1.80%)
helped: 1032
total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs: 537 -> 542 (0.93%)
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value. This
made ALU operations on those values appear distinct as well.
Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.
Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered. I didn't implement
support for variables for the time being.
v2: Assert that info->num_variables == 0 (requested by Jason).
total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%)
helped: 16872
total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs: 640654 -> 620094 (-3.21%)
helped: 2071
HURT: 585
GAINED: 14
LOST: 25
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.
The next patch will introduce another case that requires SSA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to count NIR instructions via shader-db.
Use "run" as normal. The results file will contain both NIR and
assembly.
Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)
Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is useful for debugging and looking for optimization opportunities.
It will need to be expanded when we add support for other scalar stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.
total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.
On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message. Since we only do this for >= HSW,
this is ok since there are no MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space. The usual candidate was
the destination of the sampler instruction. However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data. Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previous code semantic was:
. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.
Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.
Drivers like r300 dislike when vs has inputs that are not fed.
Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.
The new code semantic is:
. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage
The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.
This fixes 3Dmark05, which uses ff vs with programmable ps.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.
This patch makes this buffer be allocated once.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.
The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
texcoord for ps < 1_4 should clamp between 0 and 1 the values.
texcrd (texcoord ps 1_4) does not clamp and can be used with
two modifiers _dw and _dz that means the channels are divided
by w or z.
Implement those in shared code, since the same modifiers can be used
for texld ps 1_4.
v2: replace DIV by RCP + MUL
v3: Remove an useless MOV
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Nothing seems to indicates the negation modifier would be stored in the
instruction flags instead of the source modifier. tx_src_param has
already handled it if it is in the source modifier.
In addition,
when the card supports native integers, the boolean
are stored in 32 bits int and are equal to
0 or 0xFFFFFFFF.
Given 0xFFFFFFFF is NaN if it was a float, better use
UIF than IF.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation was behaving fine, but improve it by:
. Improved documentation
. Decreasing counter (comparing to 0 is likely to be faster than to constant)
. Move the counter update at the end for better performance for shaders that
break the loop earlier than when the count is done.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation didn't work well with nested loops.
Instead of using several address registers, put a0 and aL
into normal registers, and copy them to one address register when
we need to use them.
Wine tests loop_index_test() and nested_loop_test() now pass correctly.
Fixes r600g crash while loading Bioshock -
bug https://bugs.freedesktop.org/show_bug.cgi?id=85696
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
When the input's xyz are 0.0, the output
should be 0.0. This is due to the fact that
Inf * 0 = 0 for dx9. To handle this case,
cap the result of RSQ to FLT_MAX. We have
FLT_MAX * 0 = 0.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We should use the absolute value of the input as input to ureg_RSQ.
Moreover, an input of 0.0 should return FLT_MAX.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Let's say we have c1 and c2 declared in the shader and c0 given by the app
Then here we would have read c0, c1 and c2 given by the app, instead
of the correct c0, c1, c2.
This correction fixes several issues in some games.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Convert them to shader booleans at earlier stage.
Previous code is fine, but later patch will make
integers being converted at earlier stage, so do
the same for booleans
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Buffers in the MANAGED pool are supposed to have the content in a ram buffer,
a copy in VRAM if there is enough memory (driver manages memory and decide when
to delete the buffer in VRAM).
This is not implemented properly in nine, and a VRAM copy is going to be created
when the RAM memory is filled, and the VRAM copy will get synced with the RAM
memory updates.
Due to some issues (in the implementation or in app logic), it can happen
we try to create a sampler view of the resource while we haven't created the
VRAM resource. This hack creates the resource when we hit this case, which prevents
crashing, but doesn't help with the resource content.
This fixes several games crashing at launch.
Acked-by: Axel Davy <axel.davy@ens.fr>
Acked-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Stanislaw Halik <sthalik@misaki.pl>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
While previous code was having the correct behaviour in general,
this new code is more readable (without checking all gallium formats
manually) and has a more defined behaviour for depth stencil resources.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The implicit swapchains are destroyed when the device instance is
destroyed. However for non-implicit swapchains, it is not the case,
and the application can have kept an reference on the swapchain
buffers to reuse them.
Fixes problems with battle.net launcher.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This->surfaces contains the surfaces associated to the levels
and faces. This->surfaces[6*Level] is what we want here,
since it gives us a face descriptor for the level 'Level'.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The cap means D3DFVF_XYZRHW vertices will see clipping.
This is not the case when
PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION is supported, since
it'll disable clipping.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The clip state was reset everytime, incurring an overhead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value. But once we start doing vector bools in a few commits,
we end up getting bad values.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do. Just provide the
simpler helper, instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Mac OS X XQuartz places X11 headers at /opt/X11/include.
This patch fixes this Mac OS X SCons build error.
Compiling src/gallium/state_trackers/glx/xlib/glx_api.c ...
In file included from src/gallium/state_trackers/glx/xlib/glx_api.c:34:
include/GL/glx.h:30:10: fatal error: 'X11/Xlib.h' file not found
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Since the meta path can do strictly more than the blitter path, we just
remove the blitter path entirely.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This meta path, designed for use with PBO's, creates a temporary texture
out of the PBO and uses BlitFramebuffers to do the actual texture upload.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add support for handling simple packing options
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Refactor to split out the texture-from-pbo code
- Rename to _mesa_meta_pbo_TexSubImage
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Going through the for loop every time has noticable overhead. This fixes
things up so we only do that once ever and then just do a hash table lookup
which should be much cheaper.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use once_flag and call_once from c11/threads.h instead of pthreads
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Previously, we were completely ignoring the mt->offset field for
renderbuffers. While it does have some alignment constraints, it is valid
to use it. This patch adds the code to each of the 4 surface state setup
functions to handle it.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Fixes currently failing Piglit case
interface-blocks-name-reused-globally.vert
v2: combine var declaration with assignment (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6. With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
This fixes two problems reported by osc:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
The below code crashes when vector_elements <= 0
Fixes Warray-bounds warnings
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently 2 users of this functionality. I have 2 more users coming
up, and having a simple function makes the results much cleaner. The existing
interface semantics was proposed by Matt.
v2 (Ken): Rename to region_matches()/has_scalar_region().
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GEN8 added the QWORD as a valid type for certain operations on the EU.
In order to calculate the number of registers used one must have the type
size as part of the equation. Quoting the formula in the code:
regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
Adding this separately for bisection since there is no simple way to add
an assert in the type_sz function.
NOTE: As a side note, I was confused for a while because it's impossible
to calculate the region, ie. registers needed, without vstride. However,
at this point these are all part of the IR, and so no vstride must exist.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.
v2:
- Make sure not to break older LLVM.
Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'. Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two. Either draw by itself works
fine, but together, they hang the GPU. Removing the glUniform call
makes the hangs disappear. In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear. I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).
I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further. We have no real tools,
and the hardware people moved on years ago. I've analyzed 20+ error
states and read every scrap of documentation I could find.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
offset() properly handles reg_width, so it'll work for SIMD16.
While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_fs_nir.cpp creates almost all of its registers via:
fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));
When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.
This patch replaces that pattern with a new "vgrf" helper method:
fs_reg reg = vgrf(num_components);
The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.
v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.
v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).
This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.
v2: Rebase from May 2014 -> January 2015.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The filename of sha1.h was conflicting with the system-provided
sha1.h, (and in some confiurations, our sha1.c was unsuccessfully
attemping to include "sha1.h" and <sha1.h> as two different files).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88523
Commit 8ec6534 changed texture upload path and the way how texture
format is being checked, this commit adds support for GL_RGB with
GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension
EXT_texture_type_2_10_10_10_REV specification.
This fixes regression in ES3 conformance test
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We hit an assertion that the destination of the FB write should not be
an immediate. (I don't know what we were thinking.) Use ARF null.
Trying to substitute real shaders with the dummy shader would crash
when trying to upload non-existent uniforms. Say there are none.
It also wouldn't generate any code because we didn't compute the CFG,
and code generation now requires it. Compute it.
Gen4-5 also require a message header to be present.
On Gen6+, there were assertion failures in SF/SBE state because
urb_setup was memset to 0 instad of -1, causing it to think there were
attributes when nothing was set up right. Set to no attributes.
Finally, you have to ensure "Setup URB Entry Read Length" is non-zero
or you get GPU hangs, at least on Crestline.
It now works on at least Crestline and Haswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
glibc 2.19 introduced _DEFUAULT_SOURCE as a replacement for _BSD_SOURCE,
and deprecates _BSD_SOURCE with an annoying warning. Defining both is
how you're supposed to transition so let's do that. It gets rid of the
warning and we can figure out when/if we can drop _BSD_SOURCE later.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Fix build error.
CC libmesautil_la-sha1.lo
sha1.c: In function '_mesa_sha1_final':
sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)
gcry_md_hd_t h = (grcy_md_hd_t) ctx;
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88519
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The driver name is no longer const, it's always allocated dynamically
one way or another. Drop const from dri_screen_create_dri2
driver_name argument to avoid warning.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We don't actually have the code for the shader cache just yet, but
this configure machinery puts everything in place so that the shader
cache can be optionally compiled in.
Specifically, if the user passes no option (neither
--disable-shader-cache, nor --enable-shader-cache), then this feature
will be automatically detected based on the presence of a usable SHA-1
library. If no suitable library can be found, then the shader cache
will be automatically disabled, (and reported in the final output from
configure).
The user can force the shader-cache feature to not be compiled, (even
if a SHA-1 library is detected), by passing
--disable-shader-cache. This will prevent the compiled Mesa libraries
from depending on any library for SHA-1 implementation.
Finally, the user can also force the shader cache on with
--enable-shader-cache. This will cause configure to trigger a fatal
error if no sutiable SHA-1 implementation can be found for the
shader-cache feature.
Bug fix by José Fonseca <jfonseca@vmware.com>: Fix to put conditional
assignment in Makefile.am, not Makefile.sources to avoid breaking
scons build.
Note: As recommended by José, with this commit the scons build will
not compile any of the SHA-1-using code. This is waiting for someone
to write SConstruct detection of the available SHA-1 libraries, (and
set the appropriate HAVE_SHA1_* variables).
Reviewed-by: Matt Turner <mattst88@gmail.com>
The upcoming shader cache uses the SHA-1 algorithm for cryptographic
naming. These new mesa_sha1 functions are implemented with any one of
several differeny cryptographics libraries.
This code was copied from the xserver repository, (where it has
apparently been functioning well on a variety of operating systems),
and comes licensed with a license identical to that of Mesa.
Bug fixes by José Fonseca <jfonseca@vmware.com>: Fix to put
conditional assignment in Makefile.am, not Makefile.sources to avoid
breaking scons build. Fix include file for CryptoAPI section. Fix
missing cast in openssl section.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Prior to copying in code from the xserver configure.ac file, it makes
sense to have the license of this file clearly marked, (to show that
it's licensed identically to the configure.ac file from the xserver
repository).
And since the text of the license refers to "the above copyright
notice" it also makes sense to have an actual copyright attribution in
place.
I generated this list of names by looking at the output of:
git shortlog -n --format=%aD -- configure.ac
(and arbitrarily stopping for contributors with fewer than 15
commits). Then for each name, I looked for existing Copyright
attributions in the mesa source tree with the same name, (and using
"Intel Corporation" as the copyright holder where I knew that was
appropriate).
In addition to exercising all of the functions in blob.h, this
includes a stress test that forces some reallocing, and also tests to
verify the alignment and overrun-detection code in blob.c.
These functions are useful when serializing an unknown number of items
to a blob. The caller can first save the current offset, write a
placeholder uint32, write out (and count) the items, then use
blob_overwrite_uint32 with the saved offset to replace the placeholder
value.
Then, when deserializing, the reader will first read the count and
know how many subsequent items to expect.
(I wrote this code after reading a very similar patch written by
Tapani when he wrote serialization code for IR. Since I re-used the
idea of his code so directly, I've credited him as the author of this
code. --Carl)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).
There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The upcoming shader cache needs this to be able to cache hash data
from the gl_shader_program structure.
Edited-by: Carl Worth <cworth@cworth.org>:
There is an internal implementation detail that the hash table
underlying the struct string_to_uint_map stores each value internally
as (value+1). The user needn't be very concerned with this (other than
knowing that a value of UINT_MAX cannot be stored) since put() adds 1
and get() subtracts 1.
So in this commit, rather than call the user's function directly with
hash_table_call_foreach, we call through a wrapper that fixes up the
off-by-one values before the caller's callback sees them.
And with this wrapper in place, we also give a better signature to the
callback function being passed to iterate(), so that this callback
function can actually expect a char* and an unsigned argument, (rather
than a couple of void* ).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously, if __builtin_unreachable() was unavailable, the
unreachable macro was defined to do nothing. We do better here, by at
least still making it an assert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is similar to the existing functions get_instance,
get_array_instance, etc. for getting a type singleton. The new
get_sampler_instance() function will be used by the upcoming shader
cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated this for FB writes in SIMD16 mode:
load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F
fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf
The LOAD_PAYLOAD's destination had its register width set to 8, and the
FB_WRITE had its execution size set to 8. This seems wrong, and while
it probably doesn't affect anything, we should fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256 represents 1.0 and
scaling it down to a range where 32 represents 1.0. However this is not
correct because it is actually 255 and 31 that represent 1.0.
We can do better with a formula like this:
(x * 31 + 127) / 255
The +127 is to make it round correctly.
The new code has a special case to use uint64_t when the result of the
multiplication would overflow an unsigned int. This function is inline and
only ever called with constant values so hopefully the if statements will be
folded.
The main incentive to do this is to make the CPU conversion path pick the same
values as the hardware would if it did the conversion. This fixes failures
with the ‘texsubimage pbo’ test when using the patches from here:
http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html
v2: Use 64-bit arithmetic when src_bits+dst_bits > 32
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is a rework of the liveness algorithm using a worklist as suggested by
Connor. Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop. Also, the stuff after the last loop
in the funciton will only ever get visited once.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
A worklist is a common concept in optimizations. This adds a structure
that we can reuse for many different types of optimizations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Changes the initial internal format of a render buffer
to GL_RGBA4 in GLES 3. This fixes a failure in the following
DrawElements test:
dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, the set API required the user to do all of the hashing of keys
as it passed them in. Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it. Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names. This is
especially bad when the standard call looks something like
_mesa_set_add(ht, _mesa_pointer_hash(key), key);
In the above case, there is no reason why the hash set shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
This is analygous to 94303a0750 where we did this for hash_table
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already have search_pre_hashed. This makes the APIs match better.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:
cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
...
cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f
we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.
Don't do anything if the entry's destination is null and the
instruction's destination is non-null.
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Just use the abs source modifier on both of the multiplicand
arguments.
instructions in affected programs: 300 -> 296 (-1.33%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Just use the negation source modifier on one of the multiplicand
arguments.
total instructions in shared programs: 5889529 -> 5880016 (-0.16%)
instructions in affected programs: 600846 -> 591333 (-1.58%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Without the break, it was possible that an instruction would match multiple
expressions. If this happened, you could end up trying to replace it
multiple times and get a segfault. This makes it so that, after a
successful replacement, it moves on to the next instruction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This refactor allows you to more easily get the deref node associated with
a given variable. We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Additional description was added to a variety of places. Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should make debugging a lot easier as GDB handles static inlines much
better than macros. Also, static inlines are typesafe.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, our variable renaming algorithm, while similar to the one in
the Cytron paper, was not the same. While I'm pretty sure it was correct,
it will be easier for readers of the code in the variable renaming pass if
it follows more closely. This commit removes the automatic stack popping
we were doing and replaces it with explicit popping like Cytron does.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit seeks to make the lower_variables pass much more clear by
adding a pile of comments and re-arranging a few things. There are no
functional or algorithmic changes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
parallel_copy_copy was a silly name. Also, things were getting long and
annoying, so I added a foreach macro. For historical reasons, several of
the original iterations over parallel copy entries in from_ssa used the
_safe variants of the loop. However, all of these no longer ever remove an
entry so it's ok to make them all use the normal iterator.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were doing a lazy creation of the parallel copy
instructions. This is confusing, hard to get right, and involves some
extra state tracking of the copies. This commit adds an extra walk over
the basic blocks to add the block-end parallel copies up front. This
should be much less confusing and, consequently, easier to get right. This
commit also adds more comments about parallel copies to help explain what
all is going on.
As a consequence of these changes, we can now remove the at_end parameter
from nir_parallel_copy_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were emitting the full pile of setup instructions for sample_id
and sample_pos every time they were used. With this commit, we emit them
in their own pass once at the beginning of the shader and simply emit uses
later on. When it comes time for setting up VS, we can put setup for its
special values in the same pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Originally, this field was intended for determining if the given
instruction acted per-component or if it had mismatching source and
destination sizes that would have to be interpreted specially. However, we
can easily derive this from output_size == 0, so it's not really that
useful. Also, the values we were setting in nir_opcodes.h for this field
were completely bogus and it was never used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Prior to this commit, we had a big switch statement for this. Now it's
baked into the opcode metadata so we can just use that.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit adds some algebraic properties to the metadata of each opcode
in NIR. In particular, you now know, just from the metadata, if a given
opcode is commutative or associative. This will be useful for algebraic
transformation passes that want to be able to match a + b as well as b + a
in one go.
v2: Make algebraic properties all caps. This was more consistent with the
intrinsics flags and seems better for flags in general.
Also, the enums are now declared with (1 << n) rather then hex values.
v3: fmin and fmax technically aren't commutative or associative. Things
get funny when one of the arguments is a NaN.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, lower_atomics was non-SSA only. We assert-failed if the
destination of an atomic operation intrinsic was an SSA def and we used
temporary registers for computing offsets. This commit changes both of
these behaviors. We now use SSA values for computing offsets (so we can
optimize them) and we handle SSA destinations. We also move the pass to
run before we go out of SSA on i965 as it now generates SSA values.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were using foreach_dest and switching on whether the destination
was an SSA value. This works, except not all destinations are SSA values
so we have to special-case ssa_undef instructions. Now that we have a
foreach_ssa_def function, we can iterate over all of the register
destinations in one pass and iterate over the SSA destinations in a second.
This way, if we add other ssa-only instructions, we won't have to worry
about adding them to the special case we have for ssa_undef.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There are some functions whose destinations are SSA-only and so aren't a
nir_dest. This provides a function that is capable of iterating over the
SSA definitions defined by those functions. If you want registers, you
should use the old iterator.
v2: Kenneth Graunke <kenneth@whitecape.org>:
- Fix nir_foreach_ssa_def's return value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were just iterating over the program "in order" which
kind-of approximates a DFS, but not really. In particular, we got the
following case wrong:
loop {
a = 3;
if (foo) {
a = 5;
} else {
break;
}
use(a);
}
where use(a) would get 3 instead of 5 because of premature popping of the
SSA def stack. Now, since we do an actaul DFS, we should evaluate use(a)
immediately after a = 5 and we should be ok.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the condition was a scalar that applied to all components
simultaneously. As of this commit, the condition is a vector and each
component is switched seperately.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
nir_metadata_dirty was a terrible name because the parameter it takes is
the metadata to be preserved. This is really confusing because it looks
like it's doing the opposite of what it is actually doing. Now it's named
sensibly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In GLSL-to-NIR we were just setting the base index to 0 whenever there was
an indirect so having it expressed as a sum makes no sense. Also, while a
base offset may make sense for the memory location (first element in the
array, etc.) it makes less sense for the actual uniform buffer index. This
may change later, but it seems to make more sense for now.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit builds on the nir_search.h infastructure by adding a bit of
python code that makes it stupid easy to write an algebraic transformation
pass. The nir_algebraic.py file contains four python classes that
correspond directly to the datastructures in nir_search.c and allow you to
easily generate the C code to represent them. Given a list of
search-and-replace operations, it can then generate a function that applies
those transformations to a shader.
The transformations can be specified manually, or they can be specified
using nested tuples. The nested tuples make a neat little language for
specifying expression trees and search-and-replace operations in a very
readable and easy-to-edit fasion.
The generated code is also fairly efficient. Insteady of blindly calling
nir_replace_instr with every single transformation and on every single
instruction, it uses a switch statement on the instruction opcode to do a
first-order culling and only calls nir_replace_instr if the opcode is known
to match the first opcode in the search expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This framework provides a simple way to do simple search-and-replace
operations on NIR code. The nir_search.h header provides four simple data
structures for representing expressions: nir_value and four subtypes:
nir_variable, nir_constant, and nir_expression. An expression tree can
then be represented by nesting these data structures as needed. The
nir_replace_instr function takes an instruction, an expression, and a
value; if the instruction matches the expression, it is replaced with a new
chain of instructions to generate the given replacement value. The
framework keeps track of swizzles on sources and automatically generates
the currect swizzles for the replacement value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the casting operations were macros. While this is usually
fine, the casting macro used the input parameter twice leading to strange
behavior when you passed the result of another function into it. Since we
know the source and destination types explicitly, we don't loose anything
by making it a function.
Also, this gives us a nice little macro for creating cast function that
will hopefully prevent mistyping.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We used to have the number of components built into the intrinsic. This
meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
variants. This lead to piles of switch statements to generate the correct
intrinsic names, and introspection to figure out the number of components.
We can make things much nicer by allowing "vectorized" intrinsics.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit switches us over to the new variable lowering code which is
capable of properly handling lowering indirects as we go.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
With this commit, the GLSL IR -> NIR pass generates NIR in more-or-less SSA
form. It's SSA in the sense that it doesn't have any registers, but it
isn't really useful SSA because it still has a pile of load/store
intrinsics that we will need to get rid of.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass analizes all of the load/store operations and, when a variable is
never aliased (potentially used by an indirect operation), it is lowered
directly to an SSA value. This pass translates to SSA directly and does
not require any fixup by the original to-SSA pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead, we give SSA definitions a temporary index of 0xFFFFFFFF if the
instruction does not have a block and a proper index when it actually gets
added to the list.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used a string name. It was nice for translating out of GLSL
IR (which also does that) but cumbersome the rest of the time.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Backends want to be able to do special things with constant values such as
put them into immediates or make decisions based on whether or not a value
is constant. Before, constants always got lowered to a load_const into a
register and then a register use. Now we leave constants as SSA values so
backends can special-case them if they want. Since handling constant SSA
values is trivial, this shouldn't be a problem for backends.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is still fairly basic. It only handles ALU operations, constant
loads, and phi nodes. No texture ops or intrinsics yet.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The unlink_blocks function moves successors around to make sure that, if
there is a remaining successor, it is in the first successors slot and not
the second. To fix this, we simply get both successors up front.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We also make the return types match GLSL. The GLSL spec specifies that
findMSB and findLSB return a signed integer. Previously, nir had them
return unsigned. This updates nir's behavior to match what GLSL expects.
We also update the nir-to-fs generator to take the new instructions. While
we're at it, we fix the case where the input to findMSB is zero.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
These indices should now be reasonably stable/consistent. Redoing the
indices in the print functions makes it harder to debug problems.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some time while refactoring things to make it look nicer before pushing to
master, I completely broke the function. This fixes it to be correct.
Just goes to show you why you souldn't push code that has no users yet...
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit rewrites the out-of-SSA pass to not be nearly as naieve. It's
based on "Revisiting Out-of-SSA Translation for Correctness, Code Quality,
and Efficiency" by Boissinot et. al. It should be fairly close to
state-of-the art.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we don't actually have an "if" instruction, this is a very common
pattern when iterating over instructions. This adds a helper function for
it to make things a little less painful.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is kind of stupidly implemented but it should be enough to get us
up and going. We probably want something better that doesn't generate all
of the redundant moves eventually. However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.
Previously, emit_general_interpolation took an ir_variable and pulled the
information it needed from that. This meant that in fs_fp, we were
constructing a dummy ir_variable just to pass into it. This commit makes
emit_general_interpolation take only the information it needs and gets rid
of the fs_fp cruft.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
whitespace and automake fixes
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Include ralloc and hash_table from the util directory
whitespace fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
This reverts commit 0543630d0b.
It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.
We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.
Sorry I didn't remember this when reviewing the reverted change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
You would not believe the mess GCC 4.8.3 generated for the old
switch-statement.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)
The regression on 32-bit is odd. Callgrind says the caller,
_mesa_is_valid_prim_mode is faster. Before it says 2,293,760
cycles, and after it says 917,504.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Multithread:
32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40)
64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40)
Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on
32-bit or 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous check was insufficient (as it did not take 'indices' into
consideration), and DX10 hardware does not need this check anyway.
Since index_bytes is no longer used, remove it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40)
64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40)
The regression on 64-bit is odd. Callgrind says the caller,
validate_DrawElements_common is faster. Before it says 10,321,920
cycles, and after it says 8,945,664.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't affect performance, but it feels more correct.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: No difference proven at 95.0% confidence (n=120)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of having an extra pointer indirection in one of the hottest
loops in the driver.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
v2 (Ken): Cut size of array from 64 to 57 to save memory.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the switch-statement, GCC 4.8.3 produces a small pile of code with
a branch.
00000000 <brw_get_index_type>:
000000: 8b 54 24 04 mov 0x4(%esp),%edx
000004: b8 01 00 00 00 mov $0x1,%eax
000009: 81 fa 03 14 00 00 cmp $0x1403,%edx
00000f: 74 0d je 00001e <brw_get_index_type+0x1e>
000011: 31 c0 xor %eax,%eax
000013: 81 fa 05 14 00 00 cmp $0x1405,%edx
000019: 0f 94 c0 sete %al
00001c: 01 c0 add %eax,%eax
00001e: c3 ret
However, this could be two instructions.
00000000 <brw_get_index_type>:
000000: 2d 01 14 00 00 sub $0x1401,%eax
000005: d1 e8 shr %eax
000007: 90 nop
000008: 90 nop
000009: 90 nop
00000a: 90 nop
00000b: c3 ret
The function was also moved to the header so that it could be inlined at
the two call sites. Without this, 32-bit also needs to pull the
parameter from the stack. This means there is a push, a call, a move,
and a ret added to a two instruction function. The above code shows the
function with __attribute__((regparm=1)), but even this adds several
extra instructions. There is also an extra instruction on 64-bit to
move the parameter to %eax for the subtract.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
...so that it can be inlined in the two places that call it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were happily printing "Native code for unnamed vertex shader" and
"VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output,
as well as the KHR_debug output used by shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.
shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all. Craziness ensued.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only GNU indent is supported when indenting autogenerated format_pack.c
and format_unpack.c files. Some non-GNU indent (Mac OS X and FreeBSD)
add extra whitespaces than break the build of those files.
Fallback to 'cat' if a non-GNU indent is found.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88335
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The 8888 suggests 8-bit components which is not correct, so
replace that with the actual size of the components in each
format.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before we were always coping from the buffer being mapped into the
temporary buffer. However, if INVALIDATE_RANGE is set, then we know that
the data is going to be junk after we unmap so there's no point in doing
the blit. This is important because doing the blit will cause a stall 3
lines later when we map the buffer.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch enables ES2 extension that utilizes existing ES3 functionality.
Changes make all the subtests to run and pass in WebGL conformance
test 'webgl-draw-buffers' when running Chrome on OpenGL ES, also
Piglit test 'draw_buffers_gles2' passes.
v2: remove unused boolean (Ilia Mirkin)
v3: proper error checking for invalid values (Chad Versace)
v4: run error check explicitly for ES2 and ES3 (Kenneth Graunke)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This will be needed for NIR because it is typeless and treats all constants
as uint32 values and reinterprets them when they are used later. This
commit allows those values to be properly propagated.
Also, this helps some synmark shaders because it allows us to copy
propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us
combine 4 load_payloads.
instructions in affected programs: 2288 -> 2144 (-6.29%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Removes commit 7894278 changes and moves fix to _mesa_GetInternalformativ().
The original commit enabled the GL_RGB and GL_RGBA unsized internal formats
as valid for render buffers in GLES3, but this is incorrect. They should
have only been enabled for GetInternalformativ()
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88079
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components. The hardware can't write in.xy and in.w to discontiguous
registers. To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.
This fixes a problem noticed with firefox OMTC.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
According to the OpenGL and OpenGL ES specs (sections
"FRAMEBUFFER COMPLETENESS" and "Whole Framebuffer Completeness"),
the image for color, depth or stencil attachments must be renderable,
otherwise the attachment is considered incomplete and we should report
GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT. Currently, we detect this
situation properly but report a different error.
This fixes the following 3 piglit tests:
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgba_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb16f
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From GLES3 specification (page 123), "The currently bound sampler may be
queried by calling GetIntegerv with pname set to
SAMPLER_BINDINGGL_SAMPLER_BINDING".
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getboolean
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger64
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 2 dEQP tests:
dEQP-GLES3.functional.state_query.sampler.sampler_texture_min_lod_getsamplerparameteri
dEQP-GLES3.functional.state_query.sampler.sampler_texture_max_lod_getsamplerparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 8 dEQP tests:
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_max_lod_gettexparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Return the proper value for two-dimensional array texture and three-dimensional
textures.
From OpenGL ES 3.0 spec, chapter 6.1.13 "Framebuffer Object Queries",
page 234:
"If pname is FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER and the texture
object named FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is a layer of a
three-dimensional texture or a two-dimensional array texture, then params
will contain the number of the texture layer which contains the attached im-
age. Otherwise params will contain the value zero."
Furthermore, FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER is an alias of
FRAMEBUFFER_ATTACHMENT_TEXTURE_3D_ZOFFSET_EXT.
This patch fixes dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_texture_layer
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 0ae9ca12a8 put source modifiers out of the bitcast operations
by adding a MOV operation that would handle them separately. It missed
the case of ceil though: the implementation negates both its source and
destination operands. The source operand will be used for RNDD, which
we can handle normally, but we need to fix the modifier for the
negated result.
v2:
- RNDD can handle the source modifier so no need to put that one
in a separate MOV.
Fixes the following 42 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_vertex
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_fragment
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*vertex.*
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*fragment.*
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
"9.4. FRAMEBUFFER COMPLETENESS
...
Depth and stencil attachments, if present, are the same image."
Notice that this restriction is not included in the OpenGL ES2 spec.
Fixes 18 dEQP tests in:
dEQP-GLES3.functional.fbo.completeness.attachment_combinations.*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
'4.1.4 Stencil Test' section of the GL-ES 3.0 specification says:
"In the initial state, [...] the front and back stencil mask are both set
to the value 2^s − 1, where s is greater than or equal to the number of
bits in the deepest stencil buffer* supported by the GL implementation."
Since the maximum supported precision for stencil buffers is 8 bits, mask
values should be initialized to 2^8 - 1 = 0xFF.
Currently, these masks are initialized to max unsigned integer (~0u), because
in OpenGL 3.0 and before, the initial mask values were:
"In the initial state, stenciling is disabled, the front and back
stencil reference value are both zero, the front and back stencil
comparison functions are both ALWAYS, and the front and back
stencil mask are both all ones."
The problem is that it causes the mask values to overflow to -1 when converted
to signed integer by glGet* APIs.
Fixes 6 dEQP failing tests:
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_both_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_both_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The range's min and max, and the precision value are not set correctly for the
vertex shader constants.
Fixes 1 dEQP test: dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Check returned texObj is not null. If texObj is null there is already
GL_INVALID_OPERATION error set.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
GL_UNSIGNED_SHORT_5_5_5_1, GL_UNSIGNED_SHORT_1_5_5_5_REV,
GL_UNSIGNED_INT_10_10_10_2, GL_UNSIGNED_INT_2_10_10_10_REV data types
are not explicitly allowed to work with GL_ABGR_EXT format neither
in GL nor GL_EXT_abgr specs.
Removed the corresponding mesa formats as there are no other functions
using them inside Mesa anymore.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_pack_rgba_span_float was the last of the color span functions
and we have replaced all calls to it with calls to _mesa_format_convert,
so we can remove it together with tmp_pack.h which was used to
generate the pack functions for multiple types that were used from
the various color span functions that have been removed.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
This was only used to create temporary RGBA float images in the process
of storing some compressed formats. These can call _mesa_texstore
with a RGBA/float dst to achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
texstore_rgba will use the GL_COLOR_INDEX to RGBA conversion
helpers instead and compressed formats that used
_mesa_make_temp_ubyte_image to create an ubyte RGBA temporary
image can call _mesa_texstore with a RGBA/ubyte dst to
achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_unpack_bitmap() was introduced by commit 02b801c to handle the case
when data is stored in PBO by display lists, in the context of this bug:
Incorrect pixels read back if draw bitmap texture through Display list
https://bugs.freedesktop.org/show_bug.cgi?id=10370
Since _mesa_unpack_image() already handles the case of GL_BITMAP, this patch
removes _mesa_unpack_bitmap() and makes affected calls go through
_mesa_unapck_image() instead.
The sample test attached to the original bug report passes with this change
and there are no piglit regressions.
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In the future we would like to have a format conversion library that is
independent of GL so we can share it with Gallium. This is a step in that
direction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of using _mesa_pack_rgba_span_float. This should allow us to remove
that function in a later patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the only place that uses _mesa_unpack_color_span_float so after
this we should be able to remove that function.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Notice that _mesa_format_convert does not handle byte-swapping scenarios,
GL_COLOR_INDEX or MESA_FORMAT_YCBCR(_REV), so these must be handled
separately.
Also, remove all the code that goes unused after using _mesa_format_convert.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We only use _mesa_make_temp_ubyte_image in texstore.c to convert
GL_COLOR_INDEX to RGBA, but this helper does more stuff than this.
All uses of this helper can be replaced with calls to
_mesa_format_convert except for this GL_COLOR_INDEX conversion.
This patch extracts the GL_COLOR_INDEX to RGBA logic to a separate
helper so we can use that instead from texstore.c.
In future patches we will replace all remaining calls to
_mesa_make_temp_ubyte_image in the repository (related to compressed
formats) with calls to _mesa_format_convert so we can remove
_mesa_make_temp_ubyte_image and related functions.
v2:
- Remove ‘for’ loop initial declaration. They are only allowed in C99 or C11
mode.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For glReadPixels with a Luminance destination format we compute luminance
values from RGBA as L=R+G+B. This, however, requires ad-hoc implementation,
since pack/unpack functions or _mesa_swizzle_and_convert won't do this
(and thus, neither will _mesa_format_convert). This patch adds helpers
to do this computation so they can be used to support conversion to luminance
formats.
The current implementation of glReadPixels does this computation as part
of the span functions in pack.c (see _mesa_pack_rgba_span_float), that do
this together with other things like type conversion, etc. We do not want
to use these functions but use _mesa_format_convert instead (later patches
will remove the color span functions), so we need to extract this functionality
as helpers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We have _mesa_swap{2,4} but these do in-place byte-swapping only. The new
functions receive an extra parameter so we can swap bytes on a source
input array and store the results in a (possibly different) destination
array.
This is useful to implement byte-swapping in pixel uploads, since in this
case we need to swap bytes on the src data which is owned by the
application so we can't do an in-place byte swap.
v2:
- Include compiler.h in image.h, which is necessary to build in MSCV as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We had previously added the needed mesa formats, so we can simplify
the code further.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 after review by Jason Ekstrand:
- Move _mesa_format_from_format_and_type to glformats
- Return a mesa_format for GL_UNSIGNED_INT_8_8_8_8(_REV)
v3:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will come in handy when callers of _mesa_format_convert need
to compute the rebase swizzle parameter to use.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The new parameter allows callers to provide a rebase swizzle that
the function needs to use to match the requirements of the base
internal format involved. This is necessary when the source or
destination internal formats (depending on whether we are doing
the conversion for a pixel download or a pixel upload respectively)
do not match the base formats of the source or destination
formats of the conversion. This can happen when the driver does not
support the internal formats and uses a different format to store
pixel data internally.
For example, a texture upload from RGB to Luminance in a driver
that does not support textures with a Luminance format may decide
to store the Luminance data as RGBA. In this case we want to store
the RGBA values as (R,R,R,1). Following the same example, when we
download from that texture to RGBA we want to read (R,0,0,1). The
rebase_swizzle parameter allows these transforms to happen.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is necessary to handle conversions between array types where
the driver does not support the dst format requested by the client and
chooses a different format instead.
We will need this in _mesa_format_convert, so move it to format_utils.c,
prefix it with '_mesa_' and make it available to other files.
v2:
- Move _mesa_compute_component_mapping to glformats
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Iago Toral <itoral@igalia.com>:
- When testing if we can directly pack we should use the src format to check
if we are packing from an RGBA format. The original code used the dst format
for the ubyte case by mistake.
- Fixed incorrect number of bits for dst, it was computed using the src format
instead of the dst format.
- If the dst format is an array format, check if it is signed. We were only
checking this for the case where it was not an array format, but we need
to know this in both scenarios.
- Fixed incorrect swizzle transform for the cases where we convert between
array formats.
- Compute is_signed and bits only once and for the dst format. We were
computing these for the src format too but they were overwritten by the
dst values immediately after.
- Be more careful when selecting the integer path. Specifically, check that
both src and dst are integer types. Checking only one of them should suffice
since OpenGL does not allow conversions between normalized and integer types,
but putting extra care here makes sense and also makes the actual requirements
for this path more clear.
- The format argument for pack functions is the destination format we are
packing to, not the source format (which has to be RGBA).
- Expose RGBA8888_* to other files. These will come in handy when in need to
test if a given array format is RGBA or in need to pass RGBA formats to
mesa_format_convert.
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Add an RGBA8888_INT definition.
v4 by Iago Toral <itoral@igalia.com> after review by Jason Ekstrand:
- Added documentation for _mesa_format_convert.
- Added additional explanatory comments for integer conversions.
- Ensure that we use _messa_swizzle_and_convert for all signed source formats.
- Squashed: do not directly (un)pack to RGBA UINT if the source is not unsigned.
v5 by Iago Toral <itoral@igalia.com>:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Use autogenerated format pack functions and take advantage of some
macros to reduce source code, facilitating its maintenance.
Unfortunately, dstType == GL_UNSIGNED_SHORT cannot simplified like
the others, so keep it as it is.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this in a later patch to refactor _mesa_pack_rgba_span_float.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Take advantage of new mesa formats and new format_pack functions to
reduce source code in _mesa_pack_rgba_span_from_ints() and
_mesa_pack_rgba_span_from_uints().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit adds a macro to facilitate the task of using
format conversions functions but keeps the same API.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to refactor code in pack.c and support conversion
to/from these types in a master convert function that will be added
later.
v2:
- Fix autogeneration of MESA_FORMAT_A2R10G10B10_UNORM pack/unpack
functions
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to unify code in pack.c.
v2:
- Modify pack_int_*() function generator to use c.datatype() and
f.datatype()
v3:
- Only autogenerate pack_int_*() functions for non-normalized integer
formats.
v4:
- Use _mesa_unsigned_to_unsigned() in pack_int_*() because, in order
to be able to pack both signed and unsigned formats, we need to
sign-extend.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this later on to handle uint conversion scenarios in a master
convert function.
v2:
- Modify pack_uint_*() function generation to use c.datatype() and
f.datatype().
- Remove UINT_TO_FLOAT() macro usage from pack_uint*()
- Remove "if not f.is_normalized()" conditional as pack_uint*()
functions are only autogenerated for non normalized formats.
v3:
- Add clamping for non-normalized integer formats in pack_uint*()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Add usage of INDENT_FLAGS in Makefile.am
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Modify unpack_float_*() and unpack_ubyte_*() function generation
to use c.datatype() and f.datatype()
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>:
- format_unpack.c.mako is now format_unpack.py, with the template code
inlined. It now auto-generates format_unpack.c
- Add format_unpack.c to gitignore.
- Simplify Makefile.am change
- Modify SConscript to build format_unpack.c with scons
v5 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were auto-generating it before. The problem was that the autogeneration
tool we were using was called "copy, paste, and edit". Let's use a more
sensible solution.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>
- Remove format_pack.c as it is now autogenerated
- Add usage of INDENT_FLAGS in Makefile.am
- Remove trailing blank line
v3 by Samuel Iglesias <siglesias@igalia.com>
- Merge format_convert.py into format_parser.py
- Adapt pack_*_* function generations
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>
- _get_datatype() is now a helper function
v5 by Samuel Iglesias <siglesias@igalia.com>
- format_pack.c.mako is now format_pack.py, with the template code
inlined. It now auto-generates format_pack.c
- Simplify Makefile.am change.
- Modify SConscript to build format_pack.c with scons.
- Remove run_mako.py
- Add format_pack.c to gitignore
v6 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
- Add non-normalized formats support for ubyte packing functions. Merge
the previously separated patch.
- Add clamping for non-normalized integer formats in pack_ubyte*()
v7 by Samuel Iglesias <siglesias@igalia.com>:
- Add assert to check that sRGB formats are 8-bit size.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It is now a hard dependency because of the autogeneration of
format pack and unpack functions.
Update the documentation to reflect this change.
v2:
- Inline python script in m4 file and use PYTHON2
v3:
- Remove semicolons and quotes and change coding style
- Add Ilia Mirkin suggestion to use Python's split functionality.
- Use AX_CHECK_PYTHON_MAKO_MODULE name.
- Change to MIT license
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If we need the base format for a mesa_array_format we have to find the
matching mesa_format first. This is expensive because it requires
to loop through all existing mesa formats until we find the right match.
We can resolve the base format of an array format directly by looking
at its swizzle information. Also, we can have _mesa_get_format_base_format
accept an uint32_t which can pack either a mesa_format or a mesa_array_format
and resolve the base format for either type. This way clients do not need to
check if they have a mesa_format or a mesa_array_format and call different
functions depending on the case.
Another reason to resolve the base format for array formats directly is that
we don't have matching mesa_format enums for every possible array format, so
for some GL format/type combinations we can produce array formats that don't
have a corresponding mesa format, in which case we would not be able to
find the base format. Example format=GL_RGB, type=GL_UNSIGNED_SHORT. This type
would map to something like MESA_FORMAT_RGB_UNORM16, but we don't have that.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
An array format is a 32-bit integer format identifier that can represent
any format that can be represented as an array of standard GL datatypes.
Whie the MESA_FORMAT enums provide several of these, they don't account for
all of them.
v2 by Iago Toral Quiroga <itoral@igalia.com>:
- Implement mesa_array_format as a plain bitfiled uint32_t type instead of
using a struct inside a union to access the various components packed in
it. This is necessary to support bigendian properly, as pointed out by
Ian.
- Squashed: Make float types normalized
v3 by Iago Toral Quiroga <itoral@igalia.com>:
- Include compiler.h in formats.h, which is necessary to build in MSVC as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix various conversion paths that involved integer data types of different
sizes (uint16_t to uint8_t, int16_t to uint8_t, etc) that were not
being clamped properly.
Also, one of the paths was incorrectly assigning the value 12, instead of 1,
to the constant "one".
v2:
- Create auxiliary clamping functions and use them in all paths that
required clamp because of different source and destination sizes
and signed-unsigned conversions.
v3:
- Create MIN_INT macro and use it.
v4:
- Add _mesa_float_to_[un]signed() and mesa_half_to_[un]signed() auxiliary
functions.
- Add clamp for float-to-integer conversions in _mesa_swizzle_and_convert()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_BaseFormat is a GLenum (unsigned int) so testing if its value is
greater than 0 to detect the cases where _mesa_base_tex_format
returns -1 doesn't work.
Fixing the assertion breaks the arb_texture_view-lifetime-format
piglit test on nouveau, since that test calls
_mesa_base_tex_format with GL_R16F with a context that does not
have ARB_texture_float, so it returns -1 for the BaseFormat, which
was not being caught properly by the ASSERT in init_teximage_fields_ms
until now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were returning incorrect mesa formats for GL_LUMINANCE_ALPHA16I_EXT
and GL_LUMINANCE_ALPHA32I_EXT.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The PACK_565_REV macro is no longer used. It was also extremely confusing
because it's actually a byteswapped 565 not reversed 565.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aparently, the packing/unpacking functions for these formats have differed
from the format description in formats.h. Instead of fixing this, people
simply left a comment saying it was broken. Let's actually fix it for
real.
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Fix comment in formats.h
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch fixes the return of a wrong value when x is lower than
-MAX_INT(src_bits) as the result would not be between [-1.0 1.0].
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Modify snorm_to_float() to avoid doing the division when
x == -MAX_INT(src_bits)
Cc: 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).
Fixes es3conform's blend test.
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too. Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
Turns out this was harmful in code quality:
total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs: 22522 -> 21880 (-2.85%)
This costs us yet another register, which is painful since it means more
programs might fail to compile). However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
This will give the compiler the chance to dead-code eliminate unused VPM
reads. This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:
total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
If you had a conditional assignment of an array or struct (say, from the
if-lowering pass), we'd try doing swizzle_for_size() on the aggregate
type, and it would assertion fail due to vector_elements==0. Instead,
extend emit_block_mov() to handle emitting the conditional operations,
which also means we'll have appropriate writemasks/swizzles on the CMPs
within a struct containing various-sized members.
Fixes 20 testcases in es3conform on vc4.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
TextureSubImage when target=GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
GetTextureImage when the target is GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In implementing ARB_DIRECT_STATE_ACCESS functions, it is often necessary to
abstract the functionality of a traditional GL API function into a backend
that both the traditional and dsa API functions can share. For instance,
glTexParameteri and glTextureParameteri both call _mesa_texture_parameteri,
which takes a context object and a texture object as arguments.
The existance of such backend functions provides the opportunity for
driver internals (such as meta) to pass around the actual texture object
rather than its ID or target, saving on texture object storage and look-up
overhead.
This patch provides nameless texture creation and deletion for meta. This
will be used in an upcoming refactor of meta.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, certain error handling has
changed. One example shown here is that INVALID_ENUM is thrown instead of
INVALID_OPERATION when a user attempts to set sampler parameters for a
multisample target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, some error handling has
changed (see OpenGL 4.5 core spec, 30.10.2014, Section 8.10 Texture
Parameters, pages 228-29). As an example, changing sampler states with a
multisample target throws INVALID_ENUM rather than INVALID_OPERATION.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The following preparations were made in texstate.c and texstate.h to
better facilitate the BindTextureUnit function:
Dylan Noblesmith:
mesa: add _mesa_get_tex_unit()
mesa: factor out _mesa_max_tex_unit()
This is about to appear in a lot more places, so
reduce boilerplate copy paste.
add _mesa_get_tex_unit_err() checking getter function
Reduce boilerplate across files.
Laura Ekstrand:
Made note of why BindTextureUnit should throw GL_INVALID_OPERATION if the unit is out of range.
Added assert(unit > 0) to _mesa_get_tex_unit.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In order to implement ARB_DIRECT_STATE_ACCESS, many GL API functions must now
rely on a backend that both traditional and DSA functions can use. For
instance, _mesa_TexStorage2D and _mesa_TextureStorage2D both call a backend
function _mesa_texture_storage that takes a context and a texture object as
arguments. The backend is named _mesa_texture_storage so that Meta can call
it and avoid looking up the context and the texture object. However, backend
names often look very close to the names of software fallbacks (ie.
_mesa_alloc_texture_storage). For this reason, software fallbacks have been
renamed for clarity to have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Most ARB_DIRECT_STATE_ACCESS functions take an object's ID and use it to look
up the object in its hash table. If the user passes a fake object ID (ie. a
non-generated name), the implementation should throw INVALID_OPERATION.
This is a convenience function for texture objects.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We never used ulVersion for proper version checks.
Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver
started using a different version number, so the handy trick of renaming
Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for
quick testing of Mesa software renderers stopped working.
Reviewed-by: Brian Paul <brianp@vmware.com>
SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2
depending on bit 22 in the message header. If the bit is 0 or there is
no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull
constants, so use a message header in those cases and set bit 22 there.
Based on an initial patch from Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We can't (or don't know how to) turn this off. But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.
Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline. Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.
NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node. Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask. So for this case use a simple size
field rather than a bitmask.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We probably could be more clever elsewhere and mask out components that
are not used. But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Old compiler doesn't have ir3_block's.. so we need a special path. This
hack can be dropped when ir3_compiler_old is retired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them. So we implicitly assume that ArrayID==0 always
exists for each file. This is why array_max[file] is never less than
zero.
You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp. Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object. This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=88170
It doesn't work on Windows because of STDCALL calling convention -- it's
the callee responsibility to pop the arguments, and the number of
arguments vary with the prototype --, so the stack pointer ends up getting
corrupted.
This is just a non-invasive stop-gap fix. A proper fix would be more
elaborate, and require either:
- a variation of __glapi_noop_table which sets GL_INVALID_OPERATION
error
- stop using APIENTRY on all internal _mesa_* functions.
Tested with piglit gl-1.0-beginend-coverage (it now fails instead of
crashing).
VMware PR1350505
Reviewed-by: Brian Paul <brianp@vmware.com>
To catch mismatches in cdecl vs stdcall calling convention. See code
comment for more detailed explanation.
Tested with piglit gl-1.0-beginend-coverage (it now also crashes on
debug builds.)
VMware PR1350505.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- we don't usually need to flush TC L2
- we should flush KCACHE
(not really an issue now since we always flush KCACHE when updating
descriptors, but it could be a problem if we used CE, which doesn't
require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
So that TC L2 doesn't need to be flushed.
The only problem is with index buffers, which don't use TC.
A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA
when updating the same memory.
The solution for SI is to use uncached access here, because CP DMA doesn't
support cached access.
CIK will be handled in the next patch.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Only done for completeness. Not used by anything yet.
Tested by advertising PIPE_CAP_VERTEXID_NOBASE.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).
That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- the relocs array is unused, remove it
- ndw is at most 115 (init), set 140 as the maximum
- compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
From GL 4.4 Core profile:
If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not taking a
type parameter, including all of the *Draw* commands other than *DrawEle-
ments*.
Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Instead of telling the driver that the window system ancillaries have
been invalidated (when the driver doesn't know which of its buffers
are the window system's!), introduce a method for invalidating
specific surfaces.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of the EGL spec, and is useful for a tiled renderer to avoid
the memory bandwidth cost of storing the depth/stencil buffers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Merge the following upstream autoconf-archive patches.
ax_prog_flex: change grep syntax to accept e.g. "flex.real" in case a wrapper or symlink is used.
AX_PROG_FLEX: avoid use of grep empty string escape extension (fix for OpenBSD)
AX_PROG_FLEX: Also accept gflex.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@openbsd.org>
Rather than building a new one every compile. This should reduce some
of the overhead of compiling shaders.
One consequence of this change is that we lose the MachineInstrs dumps
when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is
still dumped, and if you still want to see the MachineInstr dump, you
can run the dumped LLVM IR through llc.
The "normal" detection (querying clflush size) already made sure it is
non-zero, however another method did not. This lead to crashes if this
value happened to be zero (apparently can happen in virtualized environments
at least).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this
does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function
decoration to fix the segfault which can happen if stack alignment is only
4 bytes.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The headers hadn't been regenerated in a long time and had seen a number
of manual modifications. A few changes:
- remove nvc0_2d entirely, use the nv50 header which has the nvc0
values too
- remove 3ddefs, it's identical to the nv50 file
- move macros out into a separate file
Also the upstream rnndb changed the overall chip naming convention; this
was fixed up manually in the generated files until a better solution is
determined.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The headers hadn't been regenerated in a long time, and there were a few
minor divergences. Among other things, rnndb has changed naming to
G80/etc, for now I've not tackled switching that over and manually
replaced the nvidia codenames back to the chip ids. However no other
modifications of the headergen'd headers was done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compression seems to be supported for only some formats. Enable it for
those. Previously this was disabled for everything despite the code
looking like it was actually enabled.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
assert is compiled out in release builds - don't put logic into it. Note
that this particular instance is only used for vp debugging and is
normally compiled out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_swizzle_to_scs has been showing up in my CPU profiling, which is
rather silly - it's a tiny amount of code. It really should be inlined,
and can easily be implemented with fewer instructions.
The enum translation is as follows:
SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE
0 1 2 3 4 5
4 5 6 7 0 1
SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE
which is simply (swizzle + 4) & 7.
Haswell needs extra textureGather workarounds to remap GREEN to BLUE,
but Broadwell and later do not.
This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and
gen8_surface_state.c, since the Gen8+ code can be simplified to a mere
two instructions. Both copies can be marked static for easy inlining.
v2: Put the commit message in the code as comments (requested by
Jason Ekstrand). Also fix a typo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Valve games use GL_SRGB8 textures. Instead of supporting that properly,
we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which
meant that we had to use texture swizzling to override the alpha to 1.0
when sampling. This meant shader recompiles on Gen < 7.5 platforms.
By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0
for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All
generations of hardware have supported the format for sampling and
filtering; we can easily support rendering by using the R8G8B8A8_SRGB
format and writing garbage to the X channel. (We do this already for
the non-SRGB version of this format.)
This removes all remaining shader recompiles in a time demo of "Counter
Strike: Global Offensive" (32 -> 0) on Sandybridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format
for source surfaces, and brw->render_target_format[] for destination
surfaces. We should do the same in the sRGB MSAA overrides.
Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA.
The next commit will introduce RGBX SRGB MSAA buffers, at which point
we need to get the RGBX -> RGBA format overrides for rendering right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
ir_to_mesa does this - apparently we just forgot or something.
Without this, we'll guess the wrong texture swizzle (XYZW for color
instead of XXX1 for depth) when doing precompiles.
This cuts 26 shader recompiles in a time demo of "Counter Strike:
Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0
recompiles.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.
We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.
v2: Get the generation check right (caught by Chris Wilson),
combine the ++ with the check (suggested by Daniel Vetter).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are too many state flags to fit in one terminal screen, even with
a very tall terminal. Everything is flagged once, so a value of 1 means
that it hasn't ever happened again, and thus isn't terribly interesting.
Skipping those makes it easier to see the interesting values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a partial revert of c89306983c.
It split the {start,base}_vertex_location handling into several steps:
1. Set brw->draw.start_vertex_location = prim[i].start
and brw->draw.base_vertex_location = prim[i].basevertex.
(This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
appropriately. (This happened in brw_prepare_shader_draw_parameters,
which was called just after brw_prepare_vertices, as part of state
upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).
If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.
The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias. The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all. Hence the awkward split.
However, I believe that including brw->vb.start_vertex_bias was a
mistake. It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].
>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
parameter to the command that resulted in the current shader
invocation. In the case where the command has no <baseVertex>
parameter, the value of <gl_BaseVertexARB> is zero."
I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.
With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex. We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to. This is much simpler, and should actually fix two bugs.
Fixes missing geometry in Unvanquished.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Conditionalize it on having done any uploads (Turns out
u_upload_destroy() isn't safe with a NULL arg).
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
arb_draw_indirect-vertexid elements
gl-3.2-basevertex-vertexid
Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs: 33140 -> 32745 (-1.19%)
From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:
total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs: 123444 -> 122110 (-1.08%)
with no programs hurt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs: 155923 -> 154984 (-0.60%)
Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
After CSEing some MOV ..., VF instructions we have code like
mov tmp, [1F, 2F, 3F, 4F]VF
mov r10, tmp
mov r11, tmp
...
use r10
use r11
We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Port of commit a28ad9d4 from the fs backend.
No shader-db changes since we don't emit MOV ..., VF instructions yet.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently only handles consecutive instructions with the same
destination that collectively write all channels.
total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs: 465236 -> 454449 (-2.32%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.
I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
As of 229bf4475f we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out. However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
Our ability to perform register writes depends on the hardware and
kernel version. It shouldn't ever change on a per-context basis,
so we only need to check once.
Checking introduces a synchronization point between the CPU and GPU:
even though we submit very few GPU commands, the GPU might be busy doing
other work, which could cause us to stall for a while.
On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context
creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine
Valley running in the background (to keep the GPU busy), it improves
performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* This is the cleaned up work of the Haiku GCI student
Adrián Arroyo Calle adrian.arroyocalle@gmail.com
* Several patches were consolidated to prevent
unnecessary touching of non-related code
This patch reduces the likelihood of pointer arithmetic overflow bugs in
gather_oa_results(), like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I get nervous when I see code patterns like this:
(void*) + (int) * (int)
I smell 32-bit overflow all over this code.
This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any
potential overflow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch reduces the likelihood of pointer arithmetic overflow bugs in
intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow
bug in a line of code that looks very similar to pointer arithmetic in
this function.
This patch conceptually applies the same fix as in b69c7c5dac. Instead
of retyping the variables, though, this patch adds some casts. (I tried
to retype the variables as ptrdiff_t, but it quickly got very messy. The
casts are cleaner).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch should diminish the likelihood of pointer arithmetic overflow
bugs, like the one fixed by b69c7c5dac.
Change the type of parameter 'out_stride' from int to ptrdiff_t. The
logic is that if you call intel_miptree_map() and use the value of
'out_stride', then you must be doing pointer arithmetic on 'out_ptr'.
Using ptrdiff_t instead of int should make a little bit harder to hit
overflow bugs.
As a side-effect, some function-scope variables needed to be retyped to
avoid compilation errors.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a pointer points to raw, untyped memory and is never dereferenced,
then declare it as 'void*' instead of casting it to 'void*'.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
trans_kill() only handles the single opcode. Drop the remnant of a time
when both KILL and KILL_IF were handled by the same fxn.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Standalone compiler doesn't have screen or context. We need to come up
with a better way to control the target arch (ie. something that we can
control from cmdline w/ standalone compiler) but for now this hack keeps
it from segfault'ing.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.
Improves glxgears framerate on RPi by around 30%.
This gets DRI3 working on modesetting with glamor. It's not enabled under
simulation, because it looks like handing our dumb-allocated buffers off
to the server doesn't actually work for the server's rendering.
This reverts db3dfcfe90.
The commit was correct but we've got some precision problems later in
llvmpipe (or possibly in draw clip) due to the vertices coming in in
different order, causing some internal test failures. So revert for now.
(Will only affect drivers which actually support constant-interpolated
attributes and not just flatshading.)
The blitter will start at a pixel's natural alignment. For PBOs, if the
provided offset if not aligned, bits will get dropped.
This change adds offset alignment check for src and dst, kicking back if
the requirements are not met.
The change is based on following verbiage from BSPEC:
Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp).
All pixels are naturally aligned.
Found in the following locations:
page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf
page 29 of ivb_ihd_os_vol1_part4.pdf
page 29 of snb_ihd_os_vol1_part5.pdf
This behavior was observed with Steam Big Picture rendering incorrect
icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908
Reviewed-by: Neil Roberts <neil@linux.intel.com>
C linkage was removed from functions in program/sampler.cpp. However,
some cpp files include program/sampler.h within extern "C" blocks,
causing link errors for test_vec4_copy_propagation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Chris Wilson noted that repeated calls to CheckQuery() would call
drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation,
which is expensive. Once we've flushed, we know that future batches
won't reference query->bo, so there's no point in asking more than once.
This patch adds a brw_query_object::flushed flag, which is a
conservative estimate of whether the batch has been flushed.
On the first call to CheckQuery() or WaitQuery(), we check if the
batch references query->bo. If not, it must have been flushed for
some reason (such as being full). We record that it was flushed.
If it does reference query->bo, we explicitly flush, and record that
we did so.
Any subsequent checks will simply see that query->flushed is set,
and skip the drm_intel_bo_references() call.
Inspired by a patch from Chris Wilson.
According to Eero, this does not affect the performance of Witcher 2
on Haswell, but approximately halves the userspace CPU usage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is less code and also measures the duration of the stall for us.
Our old code predates the existance of brw_bo_map().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CheckQuery calls drm_intel_bo_references to see if the batch references
the query BO, and if so, flushes. It then checks if the query BO is
busy, and if not, calls gen6_queryobj_get_results().
Stupidly, gen6_queryobj_get_results() immediately did a second redundant
drm_intel_bo_references check, even though we know the buffer is not
referenced and in fact idle.
This patch moves the batch-flush check out of gen6_queryobj_get_results
and into WaitQuery() (the other caller). That way, both callers do a
single batch-flush check.
This should only be a minor improvement, since it would only affect
the first CheckQuery call where the result is actually available.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If query->bo == NULL, this is a redundant CheckQuery call, and we
should simply return. We didn't do anything anyway - we skipped the
batch flushing block, and although we called get_results(), it has an
early return and does nothing. Why bother?
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q->Ready means that the results are in, and core Mesa is free to return
them to the application. gen6_queryobj_get_results() is a natural place
to set that flag; doing so means callers don't have to.
The older non-hardware-context aware code couldn't do this, because we
had to call brw_queryobj_get_results() to gather intermediate results
when we ran out of space for snapshots in the query buffer. We only
gather complete results in the Gen6+ code, however.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On the same go remove src/mapi/shared-glapi/tests/.gitignore
and src/mapi/glapi/tests/.gitignore as useless.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes 4 vertexid related piglit tests with llvmpipe due to switching
behavior of vertexid to the one gl expects.
(Won't fix non-llvm draw path since we don't get the basevertex currently.)
Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not
supporting vertex ids with base vertex offset applied (so, only support
d3d10-style vertex ids) will get such a d3d10-style vertex id instead -
with the caveat they'll also need to handle the basevertex system value
too (this follows what core mesa already does).
Additionally, this is also useful for other state trackers (for instance
llvmpipe / draw right now implement the d3d10 behavior on purpose, but
with different semantics it can just do both).
Doesn't do anything yet.
And fix up the docs wrt similar values.
v2: incorporate feedback from Brian and others, better names, better docs.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
r600, rv610 and rv630 all have a bug in their GPR indexing
and how the hw inserts access to PV.
If the base index for the src is the same as the dst gpr
in a previous group, then it will use PV instead of using
the indexed gpr correctly.
The workaround is to insert a NOP when you detect this.
v2: add second part of fix detecting DST rel writes followed
by same src base index reads.
v3: forget adding stuff to structs, just iterate over the
previous node group again, makes it more obvious.
v3.1: drop local_nop.
Fixes ~200 piglit regressions on rv635 since SB was introduced.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were assuming, when constructing a new brw_reg struct, that the
negate and abs register modifiers would not be present by default in
the new register.
Now, we force explicitly setting these values when constructing a new
register.
This will avoid problems like forgetting to properly set them when we
are using a previous register to generate this new register, as it was
happening in the dFdx and dFdy generation functions.
Fixes piglit test shaders/glsl-deriv-varyings
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the hash_table API required the user to do all of the hashing
of keys as it passed them in. Since the hashing function is intrinsically
tied to the comparison function, it makes sense for the hash table to know
about it. Also, it makes for a somewhat clumsy API as the user is
constantly calling hashing functions many of which have long names. This
is especially bad when the standard call looks something like
_mesa_hash_table_insert(ht, _mesa_pointer_hash(key), key, data);
In the above case, there is no reason why the hash table shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash table will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
v2: change to call the old entrypoint "pre_hashed" rather than
"with_hash", like cworth's equivalent change upstream (change by
anholt, acked-in-general by Jason).
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
glXSwapBuffersMscOML() with target_msc=divisor=remainder=0 gets
translated into target_msc=divisor=0 but remainder=1 by the mesa
api. This is done for server DRI2 where there needs to be a way
to tell the server-side DRI2ScheduleSwap implementation if a call
to glXSwapBuffers() or glXSwapBuffersMscOML(dpy,window,0,0,0) was
done. remainder = 1 was (ab)used as a flag to tell the server to
select proper semantic. The DRI3/Present backend ignored this
signalling, treated any target_msc=0 as glXSwapBuffers() request,
and called xcb_present_pixmap with invalid divisor=0, remainder=1
combo. The present extension responded kindly to this with a
BadValue error and dropped the request, but mesa's DRI3/Present
backend doesn't check for error codes. From there on stuff went
downhill quickly for the calling OpenGL client...
This patch fixes the problem.
v2: Change comments to be more clear, with reference to
relevant spec, as suggested by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Restores proper immediate tearing swap behaviour for
OpenGL bufferswap under DRI3/Present.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
v2: Add Frank Binns signed off by for his original earlier
patch from April 2014, which is identical to this one, and
Chris Wilsons reviewed tag from May 2014 for that patch, ergo
also for this one.
v3: Incorporate comment about triple buffering as suggested
by Axel Davy, and reference to relevant spec provided by
Eric Anholt.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevent calls to glXGetSyncValuesOML() and glXWaitForMscOML()
from overwriting the (ust,msc) values of the last successfull
swapbuffers call (PresentPixmapCompleteNotify event), as
glXWaitForSbcOML() relies on those values corresponding to
the most recent completed swap, not to whatever was last
returned from the server.
Problematic call sequence without this patch would have been, e.g.,
glXSwapBuffers()
... wait ...
swap completes -> PresentPixmapComplete event -> (ust,msc)
updated to reflect swap completion time and count.
... wait for at least 1 video refresh cycle/vblank increment.
glXGetSyncValuesOML()
-> PresentNotifyMsc event overwrites (ust,msc) of swap
completion with (ust,msc) of most recent vblank
glXWaitForSbcOML()
-> Returns sbc of last completed swap but (ust,msc) of last
completed vblank, not of last completed swap.
-> Client is confused.
Do this by tracking a separate set of (ust, msc) for the
dri3_wait_for_msc() call than for the dri3_wait_for_sbc()
call.
This makes the glXWaitForSbcOML() call robust again and restores
consistent behaviour with the DRI2 implementation.
Fixes applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Rename vblank_msc/ust to notify_msc/ust as suggested by
Axel Davy for better clarity.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
targetSBC == 0 is a special case, which asks the function
to block until all pending OpenGL bufferswap requests have
completed.
Currently the function just falls through for targetSBC == 0,
returning bogus results.
This breaks applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Simplify as suggested by Axel Davy. Add comments proposed
by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
A bunch of open-coded 'gpu_id > 300's seems like it will eventually
cause problems with future generations. There were already a few minor
problems with caps for features that still need additional work on a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rather than duplicating this everywhere. Especially as on a4xx the
layout of layers and levels differs based on texture type.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This code is complete nonsense and has apparently existed since I first
implemented register spilling in the VS two years ago.
Scratch reads are SEND messages, which ignore the destination writemask.
The comment about "data that may not have been written to scratch" is
also confusing - we always spill whole 4x2 registers, so such data
simply does not exist. We can safely ignore the writemask.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code has been turned off for the last
decade. Considering 3Dnow is obsolete it
seems the bug will never be fixed so just
remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
f0ba7d897d made debug_assert()/assert()
unsafe for expressions, but only now that u_atomic.h started to rely on
them for Windows that this became an issue.
This fixes non-debug builds with MSVC.
Reviewed-by: Brian Paul <brianp@vmware.com>
We support MOCS on both gen8 and gen9, so the message seems meaningless. Remove
it to avoid confusion.
Trivial.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The odds of having this patch make a difference on Gen8+ are probably very low.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-but-not-tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Because all topologies are reduced to basic primitives (i.e. no strips, fans)
and the vertices involved are all copied, there's no need for any elaborate
decisions where to insert the prim id. The logic employed was correct for
first provoking vertex, but didn't account at all for the last provoking
vertex case. And since we now will get the right constant value even if the
primitive type is later changed (for unfilled etc.) this is no longer
required to pass certain tests (which were checking for prim_id == some
const interpolated value so passing because both were wrong in the end).
This is a bit overkill (3x4 values assigned in total even though it's really
one scalar per prim...) but the code is now much easier and I don't need to
add more cases for last provoking vertex.
This fixes piglit primitive-id-no-gs-strip test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the first provoking vertex convention would only be used if
flatshading were enabled. No matter how I look at it that cannot be possibly
correct. Maybe the code getting used was somewhat simpler that way at a time
where there weren't constant interpolated attributes, only flatshading...
(Note that all other places including the decomposition macros already do
the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This stage only worked for traditional old-school flatshading, it did ignore
constant interpolated values and only handled colors, the code probably
predates using of constant interpolated values in gallium. So fix this - the
clip stage apparently did this a long time ago already.
Unfortunately this also means the stage needs to be invoked when flatshading
isn't enabled but some other prim changing stages are - for instance with
fill mode line each of the 3 lines in a tri should get the same attribute
value from the leading vertex in the original tri if interpolation is constant,
which did not happen before
Due to that, the stage is now run in more cases, even unnecessary ones. Could
in theory skip it completely if there aren't any constant interpolated
attributes (and rast->flatshade isn't set), but not sure it's worth bothering,
as it looks kinda complicated getting this information in advance.
No piglit change (doesn't really cover this directly).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just like we do for tris (det shouldn't matter at this point, however
can have flags for things like line stipple reset).
No piglit change, it would fail line stippling tests if the flatshade
stage were run, which will happen with the next commit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The previous language was a bit misleading, since it sounded like
w was interpolated then the reciprocal calculated which isn't what
should be happening.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+. We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.
No piglit regressions.
Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise. Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These structs aren't vec4 specific, they are shared by shader stages
operating on Vertex URB Entries (VUEs). VUEs are the data structures in
the URB that hold vertex data between the pipeline geometry stages.
Using vue in the name instead of vec4 makes a lot more sense, especially
when we add scalar vertex shader support.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar vertex shader will use the ATTR register file for vertex
attributes. This patch adds support for the ATTR file to fs_visitor.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This chunk of code is repeated in a few places, and we're going to add
a MESA_SHADER_VERTEX case to it soon.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This flag signals that we have a SIMD8 VS shader so we can set up the
corresponding state accordingly. This boils down to setting
the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull
constant buffers use dword pitch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is all we need from the generator for SIMD8 vertex shaders. This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the caller passes in the shader debug name, we don't need this
anymore.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This removes all stage specific data from the generator, and lets us
create a generator for any stage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't propagate the saturate bit and some instructions can't
saturate at all. If the source has saturate set, just skip propagation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Back to the original commit (8313f444) adding the workaround, we were
enabling it on gens <= 7, even though gens <= 5 can't do multisampling.
I cannot find documentation that says that Sandybridge needs this
workaround but in practice disabling it causes these piglit tests to
fail:
EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled}
On Ironlake:
total instructions in shared programs: 4358478 -> 4349671 (-0.20%)
instructions in affected programs: 117680 -> 108873 (-7.48%)
A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This way we get a warning if an enum value is not handled.
v2: codestyle
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On evergreen there are 4 regs, on r600/700 there is only one.
Don't initialise regs and trash someone elses state.
Not sure this fixes anything, but hey one less stupid.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means another pass of reordering the uniform data store, but it lets
us pair up a lot more instructions.
total instructions in shared programs: 44639 -> 43176 (-3.28%)
instructions in affected programs: 36938 -> 35475 (-3.96%)
This is a standard scheduling heuristic, and clearly helps.
total instructions in shared programs: 46418 -> 44467 (-4.20%)
instructions in affected programs: 42531 -> 40580 (-4.59%)
Collapse things back into a setup_slices() which takes the desired
alignment as a param. This gets things ready for a4xx which has some
slightly different requirements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits, but that is an assumption OpenGL drivers (or any dynamic library for that matter) can't afford to make as there are many closed- and open- source application binaries out there that only assume 4-byte stack alignment.
V4: fix comment and indentation
V3: move all sse4.1 build flag config to the same location
and add comment as to why we need to do the realign
V2: use $target_cpu rather than $host_cpu
and setup build flags in config rather than makefile
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
CC: "10.4" <mesa-stable@lists.freedesktop.org>
For OpenGL ES 3.0 spec, the minor number for SHADING_LANGUAGE_VERSION is always
two digits, matching the OpenGL ES Shading Language Specification release
number. For example, this query might return the string "3.00".
This patch fixes the following dEQP test:
dEQP-GLES3.functional.state_query.string.shading_language_version
No piglit regression observed.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL ES 3.00 spec, chapter 4.6.1 "The Invariant Qualifier",
Only variables output from a shader can be candidates for invariance. This
includes user-defined output variables and the built-in output variables.
As only outputs can be declared as invariant, an invariant output from one
shader stage will still match an input of a subsequent stage without the
input being declared as invariant.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_invariant_input
No piglit regressions observed.
v2:
- Add spec content in the code
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The current code computes ctx->Array.LegalTypesMask just once,
however, computing this needs to consider ctx->API so we need
to make sure that the API for that context has not changed if
we intend to reuse the result.
The context API can change, at least, if we go through
_mesa_meta_begin, since that will always force
API_OPENGL_COMPAT until we call _mesa_meta_end. If any
operation in between these two calls triggers a call to
update_array_format, then we might be caching a value for
LegalTypesMask that will not be right once we have called
_mesa_meta_end and restored the context API.
Fixes the following 179 dEQP tests in i965:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.multiple_attributes.input_types.3_*fixed2*
dEQP-GLES3.functional.draw.random.{2,18,28,68,83,106,109,156,181,191}
Reviewed-by: Brian Paul <brianp@vmware.com>
From GL ES 3.0 specification, section 6.1.15 Internal Format Queries (page 236),
multisampling is not supported for signed and unsigned integer internal formats.
Fixes 19 dEQP tests under 'dEQP-GLES3.functional.state_query.internal_format.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_RGB and GL_RGBA are valid internal formats on a GLES3 profile. See
"Table 1. Unsized Internal Formats" at
https://www.khronos.org/opengles/sdk/docs/man3/html/glTexImage2D.xhtml.
Fixes 2 dEQP tests:
- dEQP-GLES3.functional.state_query.internal_format.rgb_samples
- dEQP-GLES3.functional.state_query.internal_format.rgba_samples
Reviewed-by: Brian Paul <brianp@vmware.com>
In OpenGL and OpenGL-ES 3+, GL_DEPTH_STENCIL_ATTACHMENT is a valid attachment point for the family of functions
that invalidate a framebuffer object (e.g, glInvalidateFramebuffer, glInvalidateSubFramebuffer, etc).
Currently, a GL_INVALID_ENUM error is emitted for this attachment point.
Fixes 21 dEQP test failures under 'dEQP-GLES3.functional.fbo.invalidate.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This increases the cost of a raddr b conflict spill (save r3 to rb31, move
src1 to r3, move rb31 back to r3 when done, instead of just move src1 to
r3), but on average thanks to instruction pairing it's more worthwhile to
have another accumulator.
total instructions in shared programs: 46428 -> 46171 (-0.55%)
instructions in affected programs: 38030 -> 37773 (-0.68%)
The register allocator walks from the end of the nodes array looking for
trivially-allocatable things to put on the stack, meaning (assuming
everything is trivially colorable and gets put on the stack in a single
pass) the low node numbers get allocated first. The things allocated
first happen to get the lower-numbered registers, which is to say the fast
accumulators that can be paired more easily.
When we previously made the nodes match the temporary register numbers,
we'd end up putting the shader inputs (VS or FS) in the accumulators,
which are often long-lived values. By prioritizing the shortest-lived
values for allocation, we can get a lot more instructions that involve
accumulators, and thus fewer conflicts for raddr and WS.
total instructions in shared programs: 52870 -> 46428 (-12.18%)
instructions in affected programs: 52260 -> 45818 (-12.33%)
Since d8da6decea where the
state tracker started using UCMP on cayman a number of tests
regressed.
this seems to be r600g is doing CNDGE_INT for UCMP which is >= 0,
we should be doing CNDE_INT with reverse arguments.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reduces .text size of mesa_dri_drivers.so (i965-only) by 62k, or 1.4%.
Note that we don't remove inline from lerp_2d(), which has a comment
above it saying it definitely should be inlined. Though, removing the
inline keyword from it doesn't actually change the compiled code for me.
Reviewed-by: Brian Paul <brianp@vmware.com>
The docs say that we shouldn't need this workaround for gen8+, but just
removing it, causes gpu hangs. We'll revisit this, but for now, just
extend the workaround to gen9.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
SKL moves the GS threadcount to dw8 from dw7, and no longer does the
divide by 2 thing.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
This patch fixes this build error with G++ <= 4.6.
CXX test_vf_float_conversions.o
test_vf_float_conversions.cpp: In function ‘unsigned int f2u(float)’:
test_vf_float_conversions.cpp:63:20: error: expected primary-expression before ‘.’ token
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86939
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The register allocator prefers low-index registers from vc4_regs[] in the
configuration we're using, which is good because it means we prioritize
allocating the accumulators (which are faster). On the other hand, it was
causing raddr conflicts because everything beyond r0-r2 ended up in
regfile A until you got massive register pressure. By interleaving, we
end up getting more instruction pairing from getting non-conflicting
raddrs and QPU_WSes.
total instructions in shared programs: 55957 -> 52719 (-5.79%)
instructions in affected programs: 46855 -> 43617 (-6.91%)
We can avoid it by carefully ordering the packing. This is important as a
step in giving r3 to the register allocator.
total instructions in shared programs: 56087 -> 55957 (-0.23%)
instructions in affected programs: 18368 -> 18238 (-0.71%)
All uses of this require that the value be at least one, so it's
easier to report at least one than having to wrap all uses
in MAX2(max_compute_units, 1).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Harvested GPUs have some of their render backends disabled, so
in order to prevent the hardware from trying to render things
with these disabled backends we need to correctly program
the PA_SC_RASTER_CONFIG register.
v2:
- Write RASTER_CONFIG for all SEs.
v3:
- Set GRBM_GFX_INDEX.INSTANCE_BROADCAST_WRITES bit.
- Set GRBM_GFX_INFEX.SH_BROADCAST_WRITES bit when done setting
PA_SC_RASTER_CONFIG.
- Get num_se and num_sh_per_se from kernel.
v4:
- Get correct value for num_se
- Remove loop for setting PA_SC_RASTER_CONFIG
- Only compute raster config when a backend has been disabled.
v5: Michel Dänzer
- Fix computation for chips with multiple SEs
https://bugs.freedesktop.org/show_bug.cgi?id=60879
CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
There is a bug in the current lowering pass implementation where we lower saturate
to clamp only for vertex shaders on drivers supporting SM 3.0. The correct behavior
is to actually lower to clamp only when we don't support saturate which happens
on drivers that don't support SM 3.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Fixes an infinite loop in swrast where the lowering pass unpacks saturate into
clamp but the opt_algebraic pass tries to do the opposite.
v3 (Ian):
This is a revert of commit cfa8c1cb "ir_to_mesa: lower ir_unop_saturate" on
the ir_to_mesa.cpp portion. prog_execute.c can handle saturates in vertex
shaders, so classic swrast shouldn't need this lowering pass.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83463
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This was an oversight in the original patch. When PolygonMode is
used, then front faces, back faces, or both may be rendered as
points and are affected by point sprite state.
Note that SNB/IVB can't actually be fully conformant here, for
a legacy context -- we don't have separate sets of pointsprite
enables for front and back faces. Haswell ignores pointsprite
state correctly in hardware for non-point rasterization, so can
do this correctly, but it doesn't seem worth it.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86764
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dead code elimination was eating the Y offset.
Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The original idea was to optimize away the condition by integrating it directly
into the CMP instruction. However, with native integers this requires an extra
I2F instruction. It is also fishy because the negation used didn't really honor
ieee754 float comparison rules, not to mention the CMP instruction itself
(being pretty much a legacy instruction) doesn't really have defined special
float value behavior in any case.
So, use UCMP and adjust the code trying to optimize the condition away
accordingly (I have absolutely no idea if such conditions are actually hit
or would be translated away somewhere else already).
v2: cosmetic changes
No piglit regressions on llvmpipe.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Multiple scenes per context are meant to be used so a new scene can be built
while another one is processed in rasterization. However, quite surprisingly,
this does not actually work (and according to git log, possibly never did,
though maybe it did at some point further back (5 years+) but was buggy)
because we always wait immediately on the rasterizer to finish the scene when
contexts (and hence setup/scene) is flushed. This means when we try to get
an empty scene later, any old one is already empty again.
Thus using multiple scenes is just a waste of memory (not too bad, since the
additional scenes are guaranteed to be empty, which means their size ought to
be one data block (64kB) plus the size of some structs), without actually
really doing anything. (There is also quite some code for the whole concept of
multiple scenes which doesn't really do much in practice, but keep it hoping
the wait-on-scene-flush can be fixed some day.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The prim assembler may change the prim type when injecting prim ids now,
which isn't reflected by what's stored in emit.
This looks brittle and potentially dangerous (it is not obvious if such prim
type changes are really supported by pt emit, the prim type is actually also
set in prepare which would then be different).
This fixes piglit primitive-id-no-gs-first-vertex.shader_test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The decomposition done in the prim assembler will turn tri fans into tris,
but this wasn't reflected in the output prim type. Meaning with a tri fan
with 6 verts input, the output was a tri fan with 12 vertices instead of a
tri list with 12 vertices (not as bad as it sounds, since the additional tris
created would all be degenerate since they'd all have two times vertex zero
but still bogus).
This is because the prim assembler is used if either the input topology is
something with adjacency, or if prim id needs to be injected, and for the
latter case topologies without adjacency can be converted to basic ones.
Unfortunately decomposition here for inserting prim ids is necessary, at
least for the indexed case where we can't just insert the prim id at the
right place depending on provoking vertex.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The default macros when the adjacency macros aren't defined will already
exactly do that (that is, drop the adjacent vertices and call the non-adjacent
macro).
Reviewed-by: Jose Fonseca <jfonseca@vmwarec.com>
Safe from causing optimization loops, since we don't constant propagate
VF arguments.
(for this and the previous patch):
total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
instructions in affected programs: 1616779 -> 1599636 (-1.06%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The LINE instruction performs a multiply-add instruction (a * b + c)
where b and c are scalar arguments. It reads b and c from offsets in
src0 such that you can load them (it they're representable) as a
vector-float immediate with a single instruction.
Hurts some programs, but that'll all get better once we CSE the
vector-float MOVs in the next patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The PRMs say that
<src0> region must be a replicated scalar
(with HorzStride = VertStride = 0).
but apparently that doesn't actually apply to all generations. I did
notice when implementing the optimization later in this series that G45
and ILK needed this regioning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GS has an interesting use for mul. Because the GS can emit multiple
vertices per input vertex, and it also has a unique count at the top of the URB
payload, the GS unit needs to be able to dynamically specify URB write offsets
(relative to the global offset). The documentation in the function has a very
good explanation from Paul on the mechanics.
This fixes around 2000 piglit tests on BSW.
v2:
Reworded commit message (Ben) no mention of CHV (Matt)
Change SHRT_MAX to USHRT_MAX (Ken, and Matt)
Update comment in code to reflect the use of UW (Ben)
Add Gen7+ assertion for the relevant GS code, since it won't work on Gen6- (Ken)
Drop the bogus hunk in emit_control_data_bits() (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84777 (with many dupes)
Cc: "10.4 10.3 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If an operation is the last one to read a register, the instruction
containing it can also include the op that has the next write to that
register.
total instructions in shared programs: 57486 -> 56995 (-0.85%)
instructions in affected programs: 43004 -> 42513 (-1.14%)
We were scheduling TLB operations as early as possible, and texture setup
as late as possible. When I introduced prioritization, I visually
inspected that an independent operation got moved above texture results
collection, which tricked me into thinking it was working (but it was just
because texture setup was being pushed late).
total instructions in shared programs: 57651 -> 57486 (-0.29%)
instructions in affected programs: 18532 -> 18367 (-0.89%)
Jason realized that we could fix the result of the CMP instruction on
Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4
backend before use, rather than when the bool was created. The FS does
this and it saves some unnecessary resolves.
On Ironlake:
total instructions in shared programs: 4289762 -> 4287277 (-0.06%)
instructions in affected programs: 619430 -> 616945 (-0.40%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is a revert of commit 4656c14e ("i965/fs: Change the type of
booleans to UD and emit correct immediates") plus some small additional
fixes, like casting ctx->Const.UniformBooleanTrue to int and changing UD
to D in the ir_unop_b2f cases. Note that it's safe to leave 0x3f800000
as UD and as a literal it's more recognizable than 1065353216.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Three source instructions cannot directly source a packed vec4 (<0,4,1>
regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to
both halves of a register.
If these uniform values are used by multiple three-source instructions,
we'll emit multiple expansion moves, which we cannot combine in CSE
(because CSE emits moves itself).
So emit a virtual instruction that we can CSE.
Sometimes we demote a uniform to to a pull constant after emitting an
expansion move for it. In that case, recognize in opt_algebraic that if
the .file of the new instruction is GRF then it's just a real move that
we can copy propagate and such.
total instructions in shared programs: 5822418 -> 5812335 (-0.17%)
instructions in affected programs: 351841 -> 341758 (-2.87%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts an instruction from two shaders in Tesseract, by allowing the
(x+y) cmp 0 -> x cmp -y optimization to take place.
instructions in affected programs: 1198 -> 1194 (-0.33%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits,
but that is an assumption OpenGL drivers (or any dynamic library for
that matter) can't afford to make as there are many closed- and open-
source application binaries out there that only assume 4-byte stack
alignment.
This fix uses force_align_arg_pointer GCC attribute, and is only a
stop-gap measure.
The right fix would be to pass -mstackrealign or
-mincoming-stack-boundary=2 to all source fails that use any -msse*
option, as there is no way to guarantee if/when GCC will decide to spill
SSE registers to the stack.
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Reviewed-by: Brian Paul <brianp@vmware.com>
BRW_NEW_VERTICES is flagged every time we draw a primitive. Having
the brw_vs_prog atom depend on BRW_NEW_VERTICES meant that we had to
compute the VS program key and do a program cache lookup for every
single primitive. This is painfully expensive.
The workaround bit computation is almost entirely based on the vertex
attribute arrays (brw->vb.inputs[i]), which are set by brw_merge_inputs.
The only thing it uses the VS program for is to see which VS inputs are
actually read. brw_merge_inputs() happens once per primitive, and can
safely look at the currently bound vertex program, as it doesn't change
in the middle of a draw.
This patch moves the workaround bit computation to brw_merge_inputs(),
right after assigning brw->vb.inputs[i], and stores the previous WA bit
values in the context. If they've actually changed from the last draw
(which is uncommon), we signal that we need a new vertex program,
causing brw_vs_prog to compute a new key.
Improves performance in Gl32Batch7 by 13.6123% +/- 0.739652% (n=166)
on Haswell GT3e. I'm told Baytrail shows similar gains.
v2: Introduce a new BRW_NEW_VS_ATTRIB_WORKAROUNDS dirty bit, rather
than reusing BRW_NEW_VERTEX_PROGRAM (suggested by Chris Forbes).
This prevents unnecessary re-emission of surface/sampler related
atoms (and an SOL atom on Sandybridge).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If you hit this, you didn't compile with --with-egl-platforms=...
Recompile with something like --with-egl-platforms=x11,drm and make
clean and make again.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These stopped being necessary in commit ab973403e4.
v2: Update commit message with a better explanation (thanks to Eric
Anholt for doing the git archaeology).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't access brw->vertex_program or ctx->_Shader since the previous
commit, so we don't need this dirty bit.
I think it's still necessary on Gen6 because it still conflates
constant uploading with unit state uploading. We can fix that later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB
programs so that 0^0 == 1. The choice is based entirely on the shader
source language.
Previously, our code to determine which mode we wanted was duplicated
in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8).
The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing
as well - we use CurrentProgram (non-derived state), but _Shader
(derived state). It also relies on knowing that ARB programs don't
use gl_shader_program structures today. The compiler already makes
this assumption in a few places, but I'd rather keep that assumption
out of the state upload code.
With this patch, we select the mode at compile time, and store that
choice in prog_data. The state upload code simply uses that decision.
This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit c0347705 changed the Gen6-7 code to use ctx->_Shader rather than
ctx->Shader, but neglected to change the Gen4-5 or Gen8+ code.
This might fix SSO related bugs, but ALT mode is only used for ARB
programs, so if there's an actual problem, it's likely no one would
run into it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The "Pixel Shader Computed Depth Mode" value is entirely based on the
shader program, so we can easily do it at compile time. This avoids the
if+switch on every 3DSTATE_WM (Gen7)/3DSTATE_PS_EXTRA (Gen8+) upload,
and shares a bit more code.
This also simplifies the PMA stall code, making it match the formula
more closely, and drops a BRW_NEW_FRAGMENT_PROGRAM dependency. (Note
that the previous comment was wrong - the code and the documentation
have != PSCDEPTH_OFF, not ==.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We shouldn't receive variables with invalid locations set - adding these
assertions should help catch problems before they cause crashes later.
Inspired by similar code in st_glsl_to_tgsi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Half gives you the second half of a SIMD16 register, but if the register
is a uniform it would incorrectly give you the next register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nine code to match vertex declaration to vs inputs was limiting
the number of possible combinations.
Some sm3 games have issues with that, because arbitrary (usage/index)
can be used.
This patch does the following changes to fix the problem:
. Change the numbers given to (usage/index) combinations to uint16
. Do not put limits on the indices when it doesn't make sense
. change the conversion rule (usage/index) -> number to fit all combinations
. Instead of having a table usage_map mapping a (usage/index) number to
an input index, usage_map maps input indices to their (usage/index)
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
With sm3, you can declare an input/output with an usage and an usage index.
Nine code hardcodes the translation usage/index to a corresponding TGSI code.
The translation was limited to a few usage/index combinations that were corresponding
to most of the needs of games, but some games did not work.
This patch rewrites that Nine code to map all possible usage/index combination
to TGSI code. The index associated to TGSI_SEMANTIC_GENERIC doesn't need to be low
for good performance, as the old code was supposing, and is not particularly bounded
(it's UINT16). Given the index is BYTE, we can map all combinations.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Nine was allowing that behaviour, but was not filling the result.
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Issuing D3DISSUE_END should:
. reset previous queries if possible
. end the query
Previous behaviour wasn't calling end_query for
queries not needing D3DISSUE_BEGIN, nor resetting
previous queries.
This fixes several applications not launching properly.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some queries need the driver to advertise a cap to be supported.
For example r300 doesn't support them.
v2 (David): check also for PIPE_CAP_QUERY_PIPELINE_STATISTICS, fix wine
tests on r300g
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
get_query_result flushes automatically, we don't need to flush.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Applications are supposed to call CreateQuery with a NULL
ppQuery to know if the query is supported. We supported that.
However when ppQuery was not NULL, we were accepting to create the
query and were creating a dummy query even when the query is not
supported.
Wine has different behaviour. This patch drops the dummy queries
support and matches wine behaviour.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Note that some of the GLSL specifications explicitly state this as
compile error, some simply state that 'it is an error'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The state upload code was incorrectly shifting the attribute swizzles. The
effect of this is we're likely to get the default swizzle values, which disables
the component.
This doesn't technically fix any bugs since Skylake support is still disabled by
default (no PCI IDs).
While here, since VARYING_SLOT_MAX can be greater than the number of attributes
we have available, add a warning to the code to make sure we never do the wrong
thing (and hopefully prevent further static analysis from finding this).
Admittedly I am a bit confused. It seems to me like the moment a user has
greater than 8 varyings we will hit this condition. CC Ken to clarify.
v2: Forgot to git add the warning message in v1
v3: Change the > 31 varyings to an assertion (Ken)
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> (via Coverity)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Vertex color clamping is only relevant if the shader writes to
the built-in gl_[Secondary]{Front,Back}Color varyings. Otherwise,
brw_vs_prog_key::clamp_vertex_color is never used, so we can simply
leave it set to 0.
This enables us to correctly predict the clamp_vertex_color key value
in the precompile for shaders which don't use those varyings.
Eliminates virtually all VS recompiles in Serious Sam 3's intro.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to gl_[Secondary]{Front,Back}Color,
which are compatibility-only built-in varyings. We only support GS in
core profile, so they can't exist in geometry shaders.
We can drop several dirty bits from the GS program key - they're
unnecessary for a core profile implementation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to a few specific built-ins: COL0/1
and BFC0/1 (aka gl_[Secondary]{Front,Back}Color). It seems weird to
handle special cases in a function called emit_generic_urb_slot().
emit_urb_slot() is all about handling special cases, so it makes more
sense to handle this there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With fs_visitor/fs_generator being reused for SIMD8 VS/GS programs,
we're running into weird #include patterns, where scalar code #includes
brw_vec4.h and such.
Program keys aren't really related to SIMD4X2/SIMD8 execution - they
mostly capture NOS for a particular shader stage. Consolidating them
all in one place that's vec4/scalar neutral should help avoid problems.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
It's been merged into brw_state_flags::brw for simplicity and
efficiency.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives. A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.
Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism. The driconf option seems superfluous.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Note from Ken:
"We used to use the return value to indicate whether software
fallbacks were necessary, but we haven't in years."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the code in _mesa_validate_DrawElements,
_mesa_validate_DrawRangeElements, and
_mesa_validate_DrawElementsInstanced was the same. Refactor this out to
common code.
As a side-effect, a bug in _mesa_validate_DrawElementsInstanced was
fixed. Previously this function would not generate an error when
check_valid_to_render failed if numInstances was 0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3-ish versions of the spec are less clear that an error should be
generated here, so Ken (and I during review) just missed it in 1afe335.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
height=0 is legal for 1D array textures (as depth=0 is legal for
2D arrays). Fixes new piglit ext_texture_array-errors test.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together. This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.
total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs: 47361 -> 45429 (-4.08%)
Since 73dd50acf6
glsl: implement switch flow control using a loop
The SB backend was falling over in an assert or crashing.
Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%
total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most prominently helps Natural Selection 2, which has a surprising
number shaders that do very complicated things before drawing black.
instructions in affected programs: 21052 -> 16978 (-19.35%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pre-Haswell hardware couldn't actually predicate it, but it's easier to
pretend as if it's predicated in the visitor since it will generate a
MOV from f0.1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anonymous structures are only supported with newer versions of
GCC. They will not work with GCC 4.2.1 used by OpenBSD or
GCC 4.4.7 shipped with RHEL6 going by a commit to fix a similiar
problem in radeonsi earlier in the year
(74388dd24b).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
The i965 backends pass something out of 'screen', which is allocated
per-process, making using this as a ralloc context not thread-safe.
All callers ra_alloc_interference_graph() already ralloc_free() its
return value.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It was totally broken:
- p_atomic_dec_zero() was returning the negation of the expected value
- p_atomic_inc_return()/p_atomic_dec_return() was
post-incrementing/decrementing, hence returning the old value instead
of the new
- p_atomic_cmpxchg() was returning the new value on success, instead of
the old
It is clear this never used in the past. I wonder if it wouldn't be better to
yank it altogether.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It was much easier for me to verify things build and run as expected
with this simple test, than building and testing whole Mesa.
With scons the test can be build and run merely by doing:
scons u_atomic_test
Building the test with autotools is left as a future exercise.
Reviewed-by: Matt Turner <mattst88@gmail.com>
like how C11's stdatomic.h provides generic functions. GCC's __sync_*
builtins already take a variety of types, so that's simple.
MSVC and Sun Studio don't, but we can implement it with something that
looks a little crazy but is actually quite readable.
Thanks to Jose for some MSVC fixes!
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.
shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
We're supposed to be checking that nothing else writes r4, which is done
by the TMU result collection signal, not the coordinate setup.
Avoids a regression when QPU instruction scheduling is introduced.
Otherwise vertex shader can see stale cache data. This in particular
happens when the same vbo is updated and reused. Not sure yet if vbo's
at differing addresses but bound to same vertex buffer slot could have
issues, but seems safest to flush whenever new vertex buffers are bound.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For drivers building up to GL(ES)3, only expose the actual extension if
the API will let it be used (e.g. via overrides/debug flags that enable
higher versions).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The mesa state tracker doesn't fall back on similar integer formats, so
they must all be provided. Remove the restriction against integer color
rendering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We need to produce a u32 destination type on integer sampling
instructions, so keep that in a shader key set based on the
currently-bound textures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Integer outputs end up getting mangled due to cov.f32f16, and float32
loses precision. Use full precision shaders in both of those cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just pass the data through unmolested. This probably has no effect since
blending isn't actually enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The table contains all the relevant information about each format. The
helper functions now just do lookups in the table.
Note that this adds support for a lot of formats that were previously
unsupported. Additionally it adds disabled support for integer render
buffers, which will require more work to actually enable.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Switch both of them from independently inconsistent conventions to having
UINT/SINT/UNORM/SNORM/FLOAT/FIXED suffixes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
BRW_CACHE_VS_PROG is more easily associated with program caches than
plain BRW_VS_PROG.
While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away
from the outdated Windowizer/Masker name.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This flag signifies that we've emitted a new SAMPLER_STATE table.
Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't
a great name. Putting it in the BRW_NEW_* hierarchy would make more
sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose.
When this flag is raised, the pointer to the SAMPLER_STATE table has
changed, so we need to re-issue any packets which point to it (unit
state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the
per-stage variants on Gen7+).
Saves 2 * sizeof(void *) bytes per context, as we remove useless
aux_compare/aux_free function pointers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong.
The number of samplers used by each program is actually computed at
draw time (brw_try_draw_prims), based purely on the currently bound
shader programs (gl_program::SamplersUsed).
CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table.
Although this could indicate that the number of samplers has changed,
it could also simply mean that the contents of the table has changed
(i.e. we've bound different textures).
The real reason these atoms depend on CACHE_NEW_SAMPLER is because they
include a pointer to the SAMPLER_STATE table. This was not commented.
So, move the comments to the appropriate place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've been streaming these out for ages, so they basically have nothing
to do with brw_state_cache.c.
Saves 6 * sizeof(void *) bytes per context, as we won't have useless
aux_compare/aux_free functions for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These always happen together; the extra atom just means another item to
iterate through, flags to check, and a call through a function pointer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen4-5, unit state is specified as indirect state, rather than
commands. If any unit state changes, we upload it via brw_state_batch
and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which
updates pointers to all unit state at once.
Since there's only one command and state atom (brw_psp_urb_cs) that
needs to know about this, there's no benefit to having six separate
flags. We can combine CACHE_NEW_*_UNIT into a single flag.
We also haven't cached these in a long time, so it doesn't make sense
to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix.
This also saves 12 * sizeof(void *) bytes of memory per context, as
we remove useless aux_compare/aux_free functions for each CACHE bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.
This saves us from having to look an extra key value in a couple of
places, including in the generator.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This means the generator doesn't have to look at the key, which is a
little nicer - we're pretty close to no key dependencies at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kristian noted that there's very little use of brw_wm_prog_key in the
generator, and that it basically just generates what it's told, without
caring about what stage it's handling.
One exception to this is derivative handling. When handling dFdxCoarse
and dFdxFine, we packed an enum value in a second source register,
explicitly telling the generator what to do. For dFdx, we specified an
enum value of "please use the hint", then checked the program key in the
generator level code.
A natural method is to define separate FS_OPCODE_DD[XY]_{COARSE,FINE}
opcodes, and have the front-end (which already decides what IR to
generate based on the program key) decide which dPdx/dPdy should
correspond to. This consolidates the decision making in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
prog_data->foo is a bit more readable than brw->wm.prog_data->foo.
The local variable definition is also a great location to put the
obligatory /* CACHE_NEW_WM_PROG */ comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
brw->wm.prog_data is covered by CACHE_NEW_WM_PROG, not
BRW_NEW_FRAGMENT_PROGRAM. So, we should listen to it.
However, I believe that BRW_NEW_FRAGMENT_PROGRAM is sufficient to cover
all the necessary cases - CACHE_NEW_WM_PROG happens in a subset of
cases. So, the code being wrong shouldn't have triggered bugs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Right now, in mesamatrix.net, the footnote is set so that it seems to be
for all the features, while actually it only applies to MSAA.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Flex and lex have a special action ‘|’ which means to use the same action as
the next rule. We can use this to reduce a bit of code duplication in the
rules for the various float literal formats.
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the GLSL spec float literals like ‘1f’ shouldn't be allowed
without adding a decimal point or an exponent. Apparently the AMD driver also
disallows this so it seems unlikely that anything would be relying on it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are using 1 more buffer than we have, although in the future the
driver should just end up using one buffer in total probably, this
is a good first step, it merges the txq cube array and buffer info
constants on r600 and evergreen.
This should in theory fix geom shader tests on r600.
v1.1: fix comments from Glenn.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just use the same entrypoints we use for st/wgl's opengl32.dll.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's not exported by the official opengl32.dll neither. Applications are
supposed to get it via wglGetProcAddress(), not GetProcAddress().
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes several MSVC warnings like:
warning C4273: 'glClearColorx' : inconsistent dll linkage
In fact, we should avoid using `declspec(dllexport)` altogether, and use
exclusively the .DEF instead, which gives more precise control of which
symbols must be exported, but all the public GL/GLES headers practically
force us to pick between `declspec(dllexport)` or
`declspec(dllimport)`.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Addresses MSVC warnings "result of 32-bit shift implicitly converted to
64 bits (was 64-bit shift intended?)", which can often be symptom of
bugs, but in these cases were all benign.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
- Remove no-op if-clause.
- -mstackrealign has been enabled again on MinGW for quite some time and
appears to work alright nowadays.
- Drop -mmmx option as it is implied my -msse, and we don't use MMX
intrinsics anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
PIPE_CAP_VIDEO_MEMORY returns the amount of video memory in megabytes,
so need to converted it to bytes.
Fixed Warframe memory detection.
v2: also prepare for cards with more than 4GB memory
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
D3DPOOL_SCRATCH is disallowed according to spec.
D3DPOOL_SYSTEMMEM should be allowed but we don't handle it right for now.
v2: Fixes segfault in SetTexture when unsetting the texture
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Fixes "Error : CONST[20]: Undeclared source register" when running
dx9_alpha_blending_material. Also artifacts on ilo.
v2: also remove unused MISC_CONST
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch moves the data field from Resource9 to Surface9 and cleans
D3DPOOL_SYSTEMMEM handling in Texture9. This fixes HL2 lost coast.
It also removes in Texture9 some code written to support importing
and exporting non D3DPOOL_SYSTEMMEM shared buffers. This code hadn't
the design required to support the feature and wasn't used.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Rather than shoving all the VL code for non-VL targets, increasing
their size, just split it out and use it when needed. This gives us
the side effect of building vl_winsys_dri.c once, dropping a few
automake warnings, and reducing the size of the dri modules as below
text data bss dec hex filename
5850573 187549 1977928 8016050 7a50b2 before/nouveau_dri.so
5508486 187100 391240 6086826 5ce0aa after/nouveau_dri.so
The above data is for a nouveau + swrast + kms_swrast 'megadriver'.
v2: Do not include the vl sources in the auxiliary library.
v3: Rebase. Add nine.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With follow up commit we'll split vl static lib from the auxiliary one,
and choose the appropriate vl (galliumvl or galliumvl_stub) for the
respective targets to link against.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will be used by the non-VL targets, to stub out the functions called
by the drivers. The entry point to those are within the VL
state-trackers, yet the compiler cannot determine that at link time.
Thus we'll need to stub them out to prevent unresolved symbols in the
dri, egl, gbm and pipe-loader targets.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Set a single VL_{CFLAG,LIBS} for xcb and friends, and let each target
check for it's relevant library alone. Required as with follow up
commits we'll build aux/vl into a separate module, which needs VL_CFLAGS
Cleanup add a couple of explicit LIBDRM_LIBS linking, as aux/vl itself
requires libdrm, despite that LIBDRM_{RADEON,NOUVEAU...} may provide it
as well.
v2: Rebase. Make sure st/xvmc programs work.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Or we might end up where automatically enable the build, only to error
out a couple of lines after that.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.
V2:
- Removed the SSSE3 related part for the not yet merged patch.
- Avoiding redefinition of macros.
Tested-by: David Heidelberg <david@ixit.cz>
Since pack_bytes expands to two mov(4) align1 instructions, we can't use
swizzles directly. For an instruction like
pack_bytes m4.y:UD, vgrf13.xyzw:UD
we can write into the .y component by settings the offset based on the
swizzle.
Also while we're doing this, we can set the dependency control hints
properly, so that a series of pack_bytes writing into separate
components of a register can issue without blocking.
Fixes broken rendering in Windows-based QtQuick2 apps run through Wine.
This library sets all texture units' GL_COORD_REPLACE, leaves point
sprite mode enabled, and then draws a triangle fan.
Will need a slightly different fix for Gen4-5, but I don't have my old
machines in a usable state currently.
V2: - Simplify patch -- the real changes are no longer duplicated across
the Gen6 and Gen7 atoms.
- Also don't clobber attr overrides -- which matters on Haswell too,
and fixes the other half of the problem
- Fix newly-introduced warnings
V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than
core flag and state; keep the state flags in order.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia noticed that my lowering pass was converting the constant array
used by textureGatherOffsets' offsets parameter to a uniform. This
broke textureGather for Nouveau, and is generally a horrible plan,
since it violates the GLSL constraint that offsets must be an
immediate constant.
When I wrote this pass, I neglected to consider whole array assignment.
I figured opt_array_splitting would handle constant indexing, so this
pass was really about fixing variable indexing.
textureGatherOffsets is an example of whole array access that we really
don't want to touch. Whole array copies don't appear to benefit from
this either - they're most likely initializers for temporary arrays
which are going to be mutated anyway. Since you're copying, you may
as well copy from immediates, not uniforms.
This patch makes the pass look for ir_dereference_arrays of
ir_constants, rather than looking for any ir_constant directly.
This way, it ignores whole array assignment.
No shader-db changes or Piglit regressions on Haswell. Some Piglit
tests generate different code (fixing textureGatherOffsets on Nouveau).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We already precompile GLSL programs; it seems logical to precompile ARB
programs as well. We just never hooked it up.
This also makes the programs compile even if no drawing occurs, which is
useful for shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also,
brw_fs_precompile had C++ linkage, while the others were C.
This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.
This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_shader_precompile should just do a precompile; it makes more sense
for the caller to decide whether we should do one. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
llvmpipe disables denorms on purpose (on x86/sse only), because denorms are
generally neither required nor desired for graphic apis (and in case of d3d10,
they are forbidden).
However, this caused some arithmetic tests using denorms to fail on some
systems, because the reference did not generate the same results anymore.
(It did not fail on all systems - behavior of these math functions is sort
of undefined when called with non-standard floating point mode, hence the
result differing depending on implementation and in particular the sse
capabilities.)
So, for the reference, simply flush all (input/output) denorms manually
to zero in this case.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=67672.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The extension itself was deleted 2 years ago. There are still some
prog_instruction opcodes from NV_fp that exist because they're used by
ir_to_mesa.cpp, though.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
They're part of NV_vertex_program2, which I'm pretty sure we're never
going to support.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
The nice thing about the good way of initializing arrays like this is that
you don't need to initialize everything in order, or even everything at
all. Taking advantage of that only needs a tiny fixup to deal with the
default NULL value of the pointers.
I haven't dropped the initialization of opcodes that exist and are unsupported.
This was the only state tracker emitting it, and hardware was just having
to lower it anyway (or failing to lower it at all).
v2: Extracted from a larger patch by Jose (which also dropped DP2A), fixed
to actually not reference TGSI_OPCODE_CND. Change by anholt.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
The translation is lowering it to not using TGSI_OPCODE_NRM, anyway.
v2: Extracted from a larger patch by Jose that also dropped DP2A usage.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Caught by clang.
warning: comparison of constant -1 with expression of type
'ir_texture_opcode' is always false
[-Wtautological-constant-out-of-range-compare]
if (op == -1)
~~ ^ ~~
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts a little more than 1k of .text size from i915g.
This was previously done in commit 5f66b340 and subsequently reverted in
commit 3661f757 after bug 30514 was filed. I believe the cause of bug
30514 wasn't anything related to cross compiling, but rather that the
toolchain used defaulted to -march=i386, and i386 doesn't have the
CMPXCHG or XADD instructions used to implement the intrinsics.
So we reverted a patch that improved things so that we didn't break
compilation for a platform that never could have worked anyway.
Ben was asking about the undocumented restriction that the math
instruction cannot use the dependency control hints. I went to reconfirm
and disabled the is_math() check in opt_set_dependency_control() and saw
that the disassembled math instructions with dependency hints had a
bogus math function. We were mistakenly overwriting it by setting an
empty conditional mod.
Unfortunately, this wasn't the cause of the aforementioned problem (I
reproduced it). This bug is benign, since we don't set dependeny hints
on math instructions -- but maybe some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These were added in commits a760c738 and 43757135 to be used in
implementing C-style aggregate initializers (commit 1b0d6aef). Paul
rewrote that code in commit 0da1a2cc to use GLSL types, rather than
AST types, leaving these copy constructors unused.
Tested by making them private and providing no definition.
Uniform names (even for hidden uniforms) are required to be unique; some
parts of the compiler assume they can be looked up by name.
Fixes the piglit test: tests/spec/glsl-1.20/linker/array-initializers-1
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting a uniform array reference to a pull constant load, the
`reladdr` expression itself may have its own `reladdr`, arbitrarily
deeply. This arises from expressions like:
a[b[x]] where a, b are uniform arrays (or lowered const arrays),
and x is not a constant.
Just iterate the lowering to pull constants until we stop seeing these
nested. For most shaders, there will be only one pass through this loop.
Fixes the piglit test:
tests/spec/glsl-1.20/linker/double-indirect-1.shader_test
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves all the CUBE section above the gradients section,
so that the gradient emission happens on one block which
is what sb/hardware expect.
v2: avoid changes to bytecode by using spare temps
v2.1: shame gcc, oh the shame. (uninit var warnings)
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The piglit tests were failing, and it appeared to be SB
optimising out things, but Glenn pointed out the gradients
are meant to be clause local, so we should emit the texture
instructions in the same clause. This moves things around
to always copy to a temp and then emit the texture clauses
for H/V.
v2: Glenn pointed out we could get another ALU fetch in
the wrong place, so load the src gpr earlier as well.
Fixes at least:
./bin/tex-miplevel-selection textureGrad 2D
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
res->bind is not an indicator of how the resource is currently bound.
buffers can be rebound across different binding points without changing
underlying storage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
This creates a usable layout for all NPOT textures. Of course these
still have lots of limitations, but at least we can render to a
level.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
For NPOT texture layouts, we want to be able to access texture levels
other than 0 directly. Since the hw doesn't support that, We do it by
adding the offset directly.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This happens with glsl-convolution-1, where we have 64 constants. This
doesn't make the test pass (we don't have 64 constants anyway, only
32) but this prevents it from crashing.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This patch remove workaround related to LLVM < 3.2 bug.
Original bug has been closed as fixed in 2011.
At this moment gallium requires LLVM 3.3 (2013).
LLVM has been tested without SSE2 support in commit
ca70de9bd2 and removed after requiring
LLVM 3.3 in commit 013ff2fae1
Original LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=6960
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In commit 5e37a2a4a8, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.
NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.
v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.
I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.
Fixes rendering of the "electric" background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.
demos/cubemap still works, which hopefully indicates that this doesn't
break things.
Also tested with:
bin/glean -o -v -v -v -t +texCube --quick
bin/cubemap -auto
from piglit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Add support for decoding the new branch control bit. I saw two things wrong with
the existing code.
1. It didn't bother trying to decode the bit.
- While we do not *intentionally* emit this bit today, I think it's interesting
to see if we somehow ended up with the bit set. It may also be useful in the
future.
2. It seemed to be the wrong bit.
- The docs are pretty poor wrt which bit this actually occupies. To me, it
/looks/ like it should be bit 28. I am not sure where Ken got 30 from. I
verified it should be 28 by looking at the simulator code.
I also added the most basic support for GOTO simply so we don't need to remember
to change the function in the future.
v2:
Move the branch_ctrl check out of the if gen >= 6 check to make it more
readable. (Matt)
ENDIF doesn't have branch_ctrl (Matt + Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts f4dd099171.
The src/gallium/tests/unit/translate_test.c gives the same results on
MinGW 64-bits as on Linux 64-bits. And since MinGW is often used for
development/testing due to its convenience, it's better not to have this
sort of differences relative to MSVC.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No changes required in the driver itself, all handled by draw.
piglit results in a quick run:
skip->pass 7
skip->fail 2
(The new failures in the ARB_fragment_layer_viewport group are expected,
we fail the same if gs doesn't write these outputs regardless of the vs.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Mostly add a couple cases so we don't just check gs for this.
There's only one gotcha, the built-in vp transform in the llvm vs can't
handle it (this would be fixable though non-trivial due to vp index being
non-constant for the SoA outputs, but we don't use it if there's a gs
neither - the whole clip/vp transform integration there is suboptimal).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When calling vaCreateImage() an internal copy of VAImage is maintained
since the allocation of "image" may not be guaranteed to live long enough.
Signed-off-by: Michael Varga <Michael.Varga@amd.com>
For 1D and 2D arrays we don't want the other coordinates being
offset and affecting where we sample. I wrote this patch 6 months
ago but lost it.
Fixes:
./bin/tex-miplevel-selection textureLodOffset 1DArray
./bin/tex-miplevel-selection textureLodOffset 2DArray
./bin/tex-miplevel-selection textureOffset 1DArray
./bin/tex-miplevel-selection textureOffset 1DArrayShadow
./bin/tex-miplevel-selection textureOffset 2DArray
./bin/tex-miplevel-selection textureOffset(bias) 1DArray
./bin/tex-miplevel-selection textureOffset(bias) 2DArray
v2: rewrite to handle more cases and be consistent with code
above.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, the kernel would dispatch thread 0, wait, then dispatch thread
1. By insisting that the thread contents use semaphores in the right
place, the kernel can sleep for longer by dispatching both threads at
once.
When using the stand alone compiler, if we try to link a shader with vertex
attributes it will segfault on linking as the binding hash tables are not
included in the shader program. Obviously, we cannot make the linking stage
succeed without the bound attributes but we can prevent the crash and just
let the linker spit its own error.
Reviewed-by: Brian Paul <brianp@vmware.com>
We cannot guarantee that vertex buffers have the necessary alignment for
fetching all AoS members at once (for instance 4x32bit XYZW data). We can
however guarantee that for textures. This did not cause errors for older
llvm versions but it now matters and will cause segfaults if the data
happens to not be aligned. Thus we need to set alignment manually.
(Note that we can't actually really guarantee data to be even element aligned
due to offsets in vertex buffers being bytes and OpenGL allowing this, but
it does not matter for x86 as alignment is only required for sse vectors -
not sure what happens on other archs, however.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=85467.
Not all drivers can set gl_Layer from VS. Add a fallback that passes the
instance id from VS to GS, and then uses the GS to set the layer.
Tested by adding
quad_buffers |= clear_buffers;
clear_buffers = 0;
to the st_Clear logic, and forcing set_vertex_shader_layered in all
cases. No piglit regressions (on piglits with 'clear' in the name).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
DRI_PRIME setups have different issues due the lack of dma-buf fences
support in the drivers. For DRI3 DRI_PRIME, a race can appear, making
tearings visible, or worse showing older content than expected. Until
dma-buf fences are well supported (and by all drivers), an alternative
is to send the buffers to the server only when rendering has finished.
Since waiting the rendering has finished in the main thread has a
performance impact, this patch uses an additional thread to offload the
wait and the sending of the buffers to the server.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Implements vblank_mode and throttling, which allows us change default ratio
between framerate and input lag.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Work of Joakim Sindholt (zhasha) and Christoph Bumiller (chrisbmr).
DRI3 port done by Axel Davy (mannerov).
v2: - nine_debug.c: klass extended from 32 chars to 96 (for sure) by glennk
- Nine improvements by Axel Davy (which also fixed some wine tests)
- by Emil Velikov:
- convert to static/shared drivers
- Sort and cleanup the includes
- Use AM_CPPFLAGS for the defines
- Add the linker garbage collector
- Restrict the exported symbols (think llvm)
v3: - small nine fixes
- build system improvements by Emil Velikov
v4: [Emil Velikov]
- Do no link against libudev. No longer needed.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
v3: thanks to Brian, improved coding style, also glennk helped spot few
things (unsigned -> int, two constify)
v4: thanks Ilia improved function, dropped u_box_clip_3d
v5: incorporated rest of Gregor proposed changes,clean ups
v6: u_box_clip_2d simplify proposed by Ilia Mirkin
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
At this moment we use only zero or positive values.
v2: Implement it for also for Solaris, MSVC assembly
and enable for other combinations.
v3: Replace MSVC assembly by assert + warning during compilation
v4: remove inc and dec with return for MSVC assembly
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
software renderer backend when using the pipe loader.
v2: - remove unneeded ifdef
- use GALLIUM_PIPE_LOADER_WINSYS_LIBS
- check for CALLOC_STRUCT
thanks to Emil Velikov
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Some of the geom shader tests produce an empty vertex shader,
on cayman we'd crash in the finaliser because last_cf was NULL.
cayman doesn't need the NOP workaround, so if the code arrives
here with no last_cf, just emit an END.
fixes crashes in a bunch of piglit geom shader tests.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The sampler_array_size field was added by "mesa/st: add support for
dynamic sampler offsets". But the field wasn't getting copied in
the get_pixel_transfer_visitor() or get_bitmap_visitor() functions.
The count_resources() function then didn't properly compute the
glsl_to_tgsi_visitor::samplers_used bitmask. Then, we didn't declare
all the sampler registers in st_translate_program(). Finally, we
asserted when we tried to emit a tgsi ureg src register with File =
TGSI_FILE_UNDEFINED.
Add the missing assignments and some new assertions to catch the
invalid register sooner.
Cc: "10.3, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Using the asynchronous DMA engine for multi-dimensional operations seems
to cause random GPU lockups for various people. While the root cause for
this might need to be fixed in the kernel, let's disable it for now.
Before re-enabling this, please make sure you can hit all newly enabled
paths in your testing, preferably with both piglit and real world apps,
and get in touch with people on the bug reports below for stability
testing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Unfortunately no LLVM type was generated for pipe_viewport_state -- it
was being treated as a single floating point array --, so llvmpipe (and
any driver that relies on draw/llvm) got totally busted.
We were missing a few files
- The version scripts
- Android & scons build scripts
- A few headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add all headers into Makefile.sources
- Don't forget the target-helpers
- Add the python scripts & the formats table/list (csv)
- Temporary add vl/vl_winsys_dri.c to EXTRA_DIST until we rework the
way VL is build.
- Add the following to EXTRA_DIST - they are included via the
generated u_indices_gen.c thus we should not add them to *SOURCES.
indices/u_indices.c
indices/u_unfilled_indices.c
XXX: Should we nuke gallivm/f.cpp ? It seems that no-one is using it.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The DRM_IOCTL_MODE_CREATE_DUMB (and others) IOCTL isn't very rigorously
specified, which has the effect that some kernel drivers do not consider
the .pitch and .size fields of struct drm_mode_create_dumb outputs only.
Instead they will use these as lower bounds and overwrite them only if
the values that they compute are larger than what userspace provided.
This works if and only if userspace initializes the fields explicitly to
either 0 or some meaningful value. However, if userspace just leaves the
values uninitialized and the struct drm_mode_create_dumb is allocated on
the stack for example, the driver may try to overallocate buffers.
Fortunately most userspace does zero out the structure before passing it
to the IOCTL, but there are rare exceptions. Mesa is one of them. In an
attempt to rectify this situation, kernel drivers are being updated to
not use the .pitch and .size fields as inputs. However in order to fix
the issue with older kernels, make sure that Mesa always zeros out the
structure as well.
Future IOCTLs should be more rigorously defined so that structures can
be validated and IOCTLs rejected if output fields aren't set to zero.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Addition of color fmt bitfield to this register (compared to a3xx) means
we need to re-emit if either prog or framebuffer state is dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This reverts commit 8d3f739383.
In the last commit we've updated our check to determine if the actual
code is buildable, rather than if the compiler acknowledges the option.
I.e. did anyone provide -mno-sse4.1 vs is my compiler too old.
Now this code will never be attemped to be build, in both cases.
Confirmed by building mesa with
export CFLAGS='-march=native -mno-sse4.1'
./configure && make
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So when checking/building sse code we have three possibilities:
1 Old compiler, throws an error when using -msse*
2 New compiler, user disables sse* (-mno-sse*)
3 New compiler, user doesn't disable sse
The original code, added code for #1 but not #2. Later on we patched
around the lack of handling #2 by wrapping the code in __SSE4_1__.
Yet it lead to a missing/undefined symbol in case of #1 or #2, which
might cause an issue for #2 when using the i965 driver.
A bit later we "fixed" the undefined symbol by using #1, rather than
updating it to handle #2. With this commit we set things straight :)
To top it all up, conventions state that in case of conflicting
(-enable-foo -disable-foo) options, the latter one takes precedence.
Thus we need to make sure to prepend -msse4.1 to CFLAGS in our test.
v2: Clean the #includes. Suggested by Ilia, Matt & Siavash.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Siavash Eliasi <siavashserver@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will be reused for the scalar VS pass.
v2 (Ken): Rebase on master.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll reuse this toplevel optimization driver for the scalar VS.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These last few operations all only apply when we've actually generated
code, optimized and allocated registers. The dummy and the repclear
shaders don't need the gen4 send workaround, and don't spill. This
means we can move these lines into the else-branch, which will make
the following refactoring easier.
v2 (Ken): Rebase on master, which removed the uncompressed stack.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We split out SIMD8 and SIMD16 generation into seperate calls to
new method generate_code(), which returns the start offset for the
generated code. A new get_assembly() method returns the generated code.
This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data
in the generator.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Derived from st/glx's GLX_EXT_create_context_es/es2_profile implementation.
Tested with an OpenGL ES 2.0 ApiTrace.
Reviewed-by: Brian Paul <brianp@vmware.com>
The latest version of the specs explicitly allow it, and given that Mesa
universally supports KHR_debug we should definitely support it.
Totally untested. (Just happened to noticed this while implementing
GLX_EXT_create_context_es2_profile for st/xlib.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
17 insertions(+), 102 deletions(-). Works just as well.
v2: Make emit_math take const references (suggested by Matt),
drop redundant WRITEMASK_XYZW setting (Matt and Curro).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Every other unit in the geometry pipeline automatically enables
statistics gathering. This part of the pipe has been controlled by the
DEBUG_STATS variable, but this is asymmetric. This dates back to the
original implementation, and I am not sure if there is a reason for it.
I need access to these stats to implement ARB_pipeline_statistics_query.
Eric wrote it, and Ken touched it last. Do you have any opposition?
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86145
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
According to gen2 BSpec the pipeline must be flushed at least up to the
windower before changing the scissor rect enable field. Emitting the
3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE is sufficient
to do that.
gen3 BSpec no longer has that piece of text, but let's make the same
change there too for symmetry. The spec does still say that the scissor
rectangle must be defined before enabling it, so the new order does seem
more in line with the spec.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't have fragment shaders so we shouldn't be calling
_mesa_meta_glsl_Clear() on gen2. Restore the appropriate
ARB_fragment_shader check to the clear path which was lost in:
commit 94f22fbe78
Author: Tapani Pälli <tapani.palli@intel.com>
Date: Wed Aug 8 20:46:45 2012 +0300
intel: use _mesa_meta_Clear with OpenGL ES 1.1 v2
v2: Fix spelling in commit message
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
TEXTURE_SET() is the only register macro that forgets to wrap the
argument evaluation in parens. Only simple integers are passed to this
macro so there's no bug but sitll it seems prudent to add the
parens.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Gen2 doesn't support depth/stencil textures, and since
commit c1d4d49993
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Thu Apr 24 14:11:43 2014 +0300
i915: Don't advertise Z formats in TextureFormatSupported on gen2
depth/stencil formats are no longer accepted as texture formats.
However we still want depth/stencil renderbuffers, so add explicit
format checks to intel_alloc_renderbuffer_storage() to allow such
things.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
gen2 doesn't supporte linear mip filter with anisotropic min/mag
filtering. The hardware would automagically downgrade the min/mag
filters to linear in such cases, which IMO looks worse than forcing
the mip filter to nearest.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The spec says using DOT4 for alpha is undefined unless DOT4 is also used
for color. It seems to do the right thing anyway, but better safe than sorry.
Also override numAlphaArgs to 2 for DOT4 since that's what it wants.
This migth fix something in case the specified alpha mode has only one
argument. Also avoids emitting a needless 3DSTATE_MAP_BLEND_ARG if
the specified alpha mode has three arguments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
All the shaders we've received so far had this be the case, but with
nir-to-tgsi that changed.
I might decide to make nir-to-tgsi keep the outputs in the same order, for
debugging sanity, but I'm not sure.
apitrace now supports it, and it makes it much easier to test
tracing/replaying on OpenGL ES contexts since
GLX_EXT_create_context_{es2,es}_profile are widely available.
Reviewed-by: Brian Paul <brianp@vmware.com>
I used these in the SEL peephole, but they require extra tracking and
fix ups. The SEL peephole can pretty easily find the blocks it needs
without these.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Make the helpers fill out valid Gen7 3DSTATE_SF and 3STATE_SBE. This
prevents the helpers from having to do
dw[0] = GEN7_SBE_DW1_x; // setting DW1 value to dw[0]!?
and simplifies gen7_3DSTATE_{SF,SBE}().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Render stream and render enable are independent from so enable. Having a
single return point makes it easier to see that.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Started to make pipe_stream_output_info mandatory, but ended up adding support
for stream id and making a workaround Gen7-specific.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces gen6_fill_3dstate_constant(). gen6_3DSTATE_CONSTANT_{VS,GS,PS}
are made wrappers of the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add gen6_so_3DSTATE_GS(), gen6_disable_3DSTATE_GS(), and
gen7_disable_3DSTATE_GS() to do SO on GEN6 or to disable GS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
On x86-64 this saves 8 bytes of padding in the structure, and this
reduces the size of the structure to 32 bytes.
v2: Fix constructor so that GCC won't warn about the order of
initialization.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Due to the total number of bits used in the bitfield, this does not
increase the size of the structure.
It does, however, reduce the number of instructions required each time
one of these fields is accessed. To access ::matrix_columns with the
bitfield, three instructions were required:
movzbl 0x9(%rdx),%eax
shr %al
and $0x7,%eax
As a uint8_t, only one instruction is required.
movzbl 0xa(%rdx),%eax
These fields are accessed *a lot*.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 48,103,497 16,556,096 676,447
After (64-bit): 45,722,616 15,737,964 670,607
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 61,472,611 21,051,222 821,361
After (32-bit): 57,987,421 19,872,226 811,609
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There are no uniforms in OpenGL ES 1.x, so we can't even get to this
code in that API.
Also, reorder the checks. First check that transpose is true, then
check whether or not that is legal in the current API. transpose should
never be true in an ES2 context, so this gets one check (the more
expensive one) out of the main path.
Valgrind callgrind results for a trace of Tesseract:
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (64-bit): 96,119,025 24,240,510
After (64-bit): 90,726,569 22,926,662
_mesa_UniformMatrix4fv _mesa_UniformMatrix3fv
Before (32-bit): 132,434,452 29,051,808
After (32-bit): 126,658,112 27,989,316
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before ARB_explicit_uniform_location, Mesa's location encoding allowed
locations for non-array types that had non-zero array indices.
Basically, part of the location was the uniform and part was the array
index. This meant that some checks had to occur for arrays and
non-arrays. This is no longer possible, we the checks can be split up.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 50,499,557 17,487,316 686,227
After (64-bit): 50,023,791 17,274,432 684,293
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 62,968,039 21,732,380 828,147
After (32-bit): 62,373,967 21,490,756 826,223
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I really wanted to remove 'shProg != NULL' as well, but that would have
required adding a dummy program as the default program. That seemed
like more churn than removing one test was worth.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Only one caller wanted to generate an error when location == -1, so move
the error generation to that caller. There will be more callers in the
future that do not want to generate errors.
Move the location == -1 check later in validate_uniform_parameters. As
currently implemented, glUniform1iv(-1, -1, data) would not generate an
error, but it should due to count being < 0.
The location that I have moved it to will make more sense with the next
commit.
Valgrind callgrind results for a trace of Tesseract:
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (64-bit): 51,241,217 17,740,162 689,181
After (64-bit): 50,499,557 17,487,316 686,227
_mesa_Uniform4fv _mesa_Uniform4f _mesa_Uniform1i
Before (32-bit): 63,940,605 21,987,918 831,065
After (32-bit): 62,968,039 21,732,380 828,147
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The GL_ enums were previously used because glsl_types.h couldn't be used
in C code. That was fixed some time ago (and uniforms.c already
includes glsl_types.h), so this is no longer necessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Derive whether a RT supports blending, logicop, and the like when
set_framebuffer_state() is called. This enables us to simplify
gen6_BLEND_STATE().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to the documentation, line widths higher than 40.0 may have
quality problems. That's already 20 times larger than we've been
exposing, so it seems totally sufficient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We've artificially been limiting this to 5 for no particular reason.
On Gen4-5, the limit is [0, 7.5] with a granularity of 0.5 (U3.1).
On Gen6+, the limit is [0, 7.9921875]. Since it's a U3.7, the
granularity should be 0.125 (1/8).
This patch conservatively advertises one granularity smaller than the
hardware's maximum value, just in case there's a problem using the
largest possible value. On Gen4-5, this is 7.5 - 0.5 = 7.0. On Gen6+,
this is 8.0 - 0.125 = 7.875.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than hardcoding platform values in every code path, just use the
maximum value we set.
Currently, ctx->Const.LineWidth == 5, which is smaller than the hardware
limit. But applications shouldn't be using a value larger than we
support anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Line Width moved to DW1 bits 29:12. It's actually now a U11.7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use clamping constants that guarantee no integer overflows.
As spotted by Chris Forbes.
This causes the code to change as:
- value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967295.0f);
+ value |= (uint32_t)CLAMP(src[0], 0.0f, 4294967040.0f);
- value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483647.0f));
+ value |= (uint32_t)((int32_t)CLAMP(src[0], -2147483648.0f, 2147483520.0f));
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
On Windows, DllMain calls and thread creation/destruction are
serialized, so when llvmpipe is destroyed from DllMain waiting for the
rasterizer threads to finish will deadlock.
So, instead of waiting for rasterizer threads to have finished, simply wait for the
rasterizer threads to notify they are just about to finish.
Verified with this very simple program:
#include <windows.h>
int main() {
HMODULE hModule = LoadLibraryA("opengl32.dll");
FreeLibrary(hModule);
}
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=76252
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
No longer needed as the file was removed with
commit 8c229d306b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Mon Aug 11 10:07:07 2014 -0700
i965: Delete the Gen8 code generators.
We now use the brw_eu_emit.c code instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
During teardown we free the driver_configs list pointer, but we forget
to deallocate each config in that list.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
The function is not called by platform_drm. As such one needs to
pay special attention at teardown.
v2: Fix the comment block. Spotted by Ken.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Earlier commit failed to attribure that for drm platforms one does not
call dri2_create_screen, thus it does not create the screen and
driver_configs but inherits them from the "display" - gbm.
As such wrap cleanup in Platform != _EGL_PLATFORM_DRM to prevent
the issue and still cleanup correctly for non-drm platforms.
v2:
- Drop the ifdef HAVE_DRM_PLATFORM, reindent the code and fix the
comment block. Suggested by Ken.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Move opcode to string mappings to functions of their own. Have for consistent
outputs for similar opcodes.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
When the earlier block ended with control flow, we'd mistakenly remove
some of its links to its children. The same happened with the later
block.
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
A pattern in certain shaders is:
uniform vec4 colors[NUM_LIGHTS];
for (int i = 0; i < NUM_LIGHTS; i++) {
...use colors[i]...
}
In this case, the application author expects the shader compiler to
unroll the loop. By doing so, it replaces variable indexing of the
array with constant indexing, which is more efficient.
This patch extends the heuristic to see if arrays accessed within the
loop are indexed by an induction variable, and if the array size exactly
matches the number of loop iterations. If so, the application author
probably intended us to unroll it. If not, we rely on the existing
loop-too-large heuristic.
Improves performance in a phong shading microbenchmark by 2.88x, and a
shadow mapping microbenchmark by 1.63x. Without variable indexing, we
can upload the small uniform arrays as push constants instead of pull
constants, avoiding shader memory access. Affects several games, but
doesn't appear to impact their performance.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Consider GLSL code such as:
const ivec2 offsets[] =
ivec2[](ivec2(-1, -1), ivec2(-1, 0), ivec2(-1, 1),
ivec2(0, -1), ivec2(0, 0), ivec2(0, 1),
ivec2(1, -1), ivec2(1, 0), ivec2(1, 1));
ivec2 offset = offsets[<non-constant expression>];
Both i965 and nv50 currently handle this very poorly. On i965, this
becomes a pile of MOVs to load the immediate constants into registers,
a pile of scratch writes to move the whole array to memory, and one
scratch read to actually access the value - effectively the same as if
it were a non-constant array.
We'd much rather upload large blocks of constant data as uniform data,
so drivers can simply upload the data via constbufs, and not have to
populate it via shader instructions.
This is currently non-optional because both i965 and nouveau benefit
from it, and according to Marek radeonsi would benefit today as well.
(According to Tom, radeonsi may want to handle this itself in the long
term, but we can always add a flag when it becomes useful.)
Improves performance in a terrain rendering microbenchmark by about 2x,
and cuts the number of instructions in about half. Helps a lot of
"Natural Selection 2" shaders, as well as one "HOARD" shader.
total instructions in shared programs: 5473459 -> 5471765 (-0.03%)
instructions in affected programs: 5880 -> 4186 (-28.81%)
v2: Use ir_var_hidden to avoid exposing the new uniform via the GL
uniform introspection API.
v3: Alphabetize Makefile.sources properly.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77957
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In the compiler, we'd like to generate implicit uniforms for internal
use. These should not be visible via the GL uniform introspection API.
To support that, we add a new ir_variable::how_declared value of
ir_var_hidden, and plumb that through to gl_uniform_storage.
v2 (idr): Fix some memory management issues in
move_hidden_uniforms_to_end. The comment block on the function has more
details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Walk through the list and free each config, and finally free the list
itself. Freeing approx 20KiB of memory, according to valgrind.
Inspired by a similar patch by enpeng xu.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
driUnbindContext() checks for valid drawables before calling the driver
unbind function. In case of Surfaceless contexts, the drawables are always
Null and we end up not releasing the underlying DRI context. Moving the
call to the driver function before the drawable validity checks fixes things.
Steps to trigger this bug are following:
- create surfaceless context and make it current
- make some other context current
- {another thread} destroy surfaceless context
- make another context current
Signed-off-by: Alexandros Frantzis <Alexandros.Frantzis@canonical.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74563
The dummy shader sends an EOT message to end itself. There are many more
works need to be done on the compiler side before we can advertise
PIPE_CAP_COMPUTE.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Use util_dynarray in ilo_set_global_binding() to allow for unlimited number of
global bindings. Add a comment for global bindings.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We need to know the local/input/private sizes and others. This is not
complete. We need many others for CURBE setup.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
They will be used to report compute params or program compute states.
thread_count can also be used for 3DSTATE_VS.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This started hitting an assertion recently. Only affects Haswell
(Ivybridge doesn't support this meddling with the sampler state pointer,
and ARB_gpu_shader5 is not enabled yet on Broadwell)
14 Piglits crash->pass.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves performance in GLBenchmark 2.7 TRex by 3.88889% +/- 0.336383%
(n=80) at 1280x720 on Broadwell GT3. Together with the previous patch,
it improves performance by 5.42738% +/- 0.541971% (n=10) at 1920x1080.
Note that without the PMA stall fix, this would instead decrease
performance by 22%.
v2: Update comment (noticed by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain non-promoted depth cases typically incur stalls. In very
specific cases, we can enable a workaround which improves performance.
Improves performance in GLBenchmark 2.7 TRex by 1.17762% +/- 0.448765%
(n=75) at 1280x720 on Broadwell GT3.
Haswell has this feature as well, but we can't currently write registers
from userspace batches (and we'd incur additional software batch
scanning overhead as well), so we haven't enabled it. Broadwell allows
us to write CACHE_MODE_1. Backporters beware: the formula and flushing
incantation differs between Haswell and Broadwell.
v2: Move pma_stall_bits from brw->state to brw itself (requested by
Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Matt requested this in review feedback on the original patch, which I
completely missed when pushing this series. Kristian also made this
change, but I grabbed the wrong version of the patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise, calling glPopAttrib on drivers that don't support
ARB_clip_control gives you a GL error, which is surprising at best.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We're not programming the clear values yet, so this won't work.
This patch should be (effectively) reverted eventually.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
On Skylake, the MOCS bits are an index into a table of 63 different,
configurable cache configurations. As for previous GENs, we only care about
WB and WT, which are available in the documented default set. Define
SKL_MOCS_WB and SKL_MOCS_WT to the indices for those configucations and use
those for the Skylake MOCS values.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has separate controls for enabling the Z Clip Test for the near
and far planes. For now, maintain the legacy behavior by setting both.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake's 3DSTATE_DS packet has a few more fields; we don't support
domain shaders yet though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
They are the same as for BDW, so just add a case for SKL to the init switch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On SKL, 3DSTATE_CONSTANT_* command is not committed until we give
the corresponding 3DSTATE_BINDING_TABLE_POINTERS_* command. If we
fail to do so, the constant buffers wont be read and push constants
will be wrong.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Otherwise they overlap and horrible things happen. All the new DWords
are for fast color clear values, which we don't do yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We will need to allocate more DWords on Skylake.
v2: Don't mark brw_context parameter const. It's modified.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake introduces a new base address for a feature we don't yet expose.
Setting these to 0 should be safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake uploads the stencil reference values in DW3 of the
3DSTATE_WM_DEPTH_STENCIL packet, rather than in COLOR_CALC_STATE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Skylake has some extra bits in PIPELINE_SELECT, none of which are
interesting for a 3D driver. In order to selectively change them, it
also introduces new "mask bits" in 15:8. We care about the "Pipeline
Selection" bits (1:0), so set the mask to 0x3.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This commands has seen the addition of 2 dwords that allow to specify
which channels of which attributes need to be forwarded to the fragment
shader.
v2: Rebase forward a year (done by Ken).
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The CSE pass now prints out why it thinks a value is not a candidate for
adding to the AE set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The optimization in commit d056863b covers these cases, which were the
first optimizations I added to the GLSL compiler.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should trigger CL_INVALID_VALUE if device_list is NULL and num_devices
is greater than zero.
Introduced by e5468dfa52
Reported by: EdB
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Between release 3.2 and 3.3 LLVM stopped aligning properly when certain
conditions (no allocas, but large number of vectors causing spills to
the stack, and frame pointer omission enabled).
We were already disabling frame-pointer-omission on several build types,
but we now disable it on all build types.
It's not clear whether this affects 32-bits x86 processes only, or if it
can also affect 64-bits x86_64 processes when AVX registers are
available and used. So disable frame-pointer-omission on both
x86/x86_64 to be on the safe side.
See also:
- http://llvm.org/PR21435
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
AFAICT the number of threads is 80, not 70. I am not sure if Ken knows
something I do not.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Need to do a sqrt().
FWIW, the html that Sphinx 1.1.3 generates for the math expressions
looks completely broken.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Pass and return tgsi_token buffers instead of pipe_shader_state.
And update softpipe driver (the only user of this function).
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Fixes polygon stipple if both DO_PSTIPPLE_IN_DRAW_MODULE and
DO_PSTIPPLE_IN_HELPER_MODULE are zero/off.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This was a regression introduced by
611d66fe45
Passing a binary program to clBuildProgram() is legal, but passing one
to clCompileProgram() is not.
v2:
- Code cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds a query which allows drivers to access the config
information of a specific function within the LLVM generated ELF
binary. This makes it possible for the driver to handle ELF
binaries with multiple kernels / global functions.
We are about to change mesa to spawn threads for deferred glCompileShader and
glLinkProgram, and we need to make sure those threads can send compiler
warnings/errors to the debug output safely.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There may be two contexts compiling shaders at the same time, and we want the
anonymous struct id to be globally unique.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_strtod and _mesa_strtof may be called from multiple threads. They need
to be thread-safe.
v2: platform checks are now done in configure.ac
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the assumptions that xlocale.h implies newlocale and strtof_l. SCons is
updated to define HAVE_XLOCALE_H on linux and darwin.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both core mesa and glsl have their own wrappers for strtof_l. Merge
and move them to util/. They are compiled with a C++ compiler so that
we can make them thread-safe in a following commit.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whiteacpe.org>
This reverts commit cabc93c5ad.
Mark thinks the failures on the SNB GT2 in the lab are actually because
of faulty hardware, not instruction compaction. The GT1 didn't see any
problems after changes to the compaction code.
Helps a small number of vertex shaders in the games Dungeon Defenders
and Shank, as well as an internal benchmark.
instructions in affected programs: 2801 -> 2719 (-2.93%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use the UST value provided in the PRESENT_COMPLETE_NOTIFY event
rather than gettimeofday(), which gives us the presentation time
instead of the time when SwapBuffers was called. Suggested by
Keith Packard. This relies on the fact that the X DRI3/Present
implementations use microseconds for UST.
v3: Properly ignore PresentCompleteKindMSCNotify; multiply in 64 bits
(caught by Keith Packard).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com> [v3]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
These source files support actual geometry shaders, so using "gs" for
the name makes a lot of sense. We're going to be adding SIMD8 geometry
shader support as well, at which point "vec4_gs" will be a misnomer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The brw_gs.[ch] and brw_gs_emit.c source files contain code for
emulating fixed-function unit functionality (VF primitive decomposition
or SOL) using the GS unit. They do not contain code to support proper
geometry shaders.
We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key,
brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile,
brw_ff_gs_prog). So it makes sense to make the filenames match.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
The GL functions and driver hooks use corresponding names---for example,
glMapBufferRange and Driver.MapBufferRange. But our implementation was
called "intel_bufferobj_map_range," which has the words "map" and
"buffer" swapped, as well as randomly adding "obj."
FlushMappedBufferRange was even trickier: it ordered the words
3, "obj", 1, 2, 4: intel_bufferobj_flush_mapped_range.
Even though the old names were consistent, I always had trouble
rearranging the jumble of words when searching for a function,
and it took a few tries to eventually land there.
The new names match the word order of GL and the driver hooks;
FlushMappedBufferRange is simply brw_flush_mapped_buffer_range.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The OpenGL 4.0 core profile specification, section 2.17.3
Transform Feedback Draw Operations says:
"The error INVALID_VALUE is generated if <stream> is greater
than or equal to the value of MAX_VERTEX_STREAMS.
...
The error INVALID_OPERATION
is generated if EndTransformFeedback has never been called
while the object named by id was bound."
Fixes the piglit test:
ARB_transform_feedback3/arb_transform_feedback3-draw_using_invalid_stream_index
(with the test itself fixed to eliminate an unrelated failure)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we're checking if the framebuffer is sRGB capable, call
is_format_supported() with the PIPE_BIND_DISPLAY_TARGET flag.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit 20836c8185.
255 is a huge number. If you have a loop with 255 iterations, unrolling it
will exceed the SM3 instruction limit. Let's use the default again.
The comment about a SM3 limit doesn't make sense. For SM3, we generally
want 32 (default) or a lower number due to the SM3 instruction limit, which
is 512 instructions. For SM4, we can try higher numbers if needed, but
some shaders can end up being pretty huge and shader compilation can take
more time.
This fixes a shader compile failure on R500/SM3. Reported on IRC.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
The GL side of this extension just provides an accessor via glGetIntegerv for
the value of GL_CONTEXT_RELEASE_BEHAVIOR so it is trivial to implement. There
is a constant on the context for the value of the enum which is initialised to
GL_CONTEXT_RELEASE_BEHAVIOR_FLUSH. The extension is always enabled because it
doesn't need any driver interaction to retrieve the value.
If the value of the enum is anything but FLUSH then _mesa_make_current will
now refrain from calling _mesa_flush. This should only affect drivers that
explicitly change the enum to a non-default value.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The main incentive to do this is to get the defines for the
GL_KHR_context_flush_control extension.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Different platforms require the offset to be in different units. However,
the generator fixes all of this up for us and only requires an offset in
bytes. Previously, we were getting this wrong all over the place. Some
computed/used it correctly as bytes while others treated the offset as
whole registers or computed it as bytes or bytes*2 in SIMD16 mode. This
commit cleans all this up and makes us properly treat it as bytes
everywhere.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I thought this would be a clever way to make spilling less expensive.
However, it appears that the oword read/write messages we are using for
spilling ignore the execution size and assume SIMD16 whenever working with
more than one register.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Fanin (merge) nodes require it's srcs to be "adjacent" in consecutive
scalar registers. Keep track of instruction neighbors in copy-
propagation step and avoid eliminating mov's which would cause an
instruction to need multiple distinct left and/or right neighbors.
This lets us not fall on our face when we encounter things like:
1: MOV TEMP[2], IN[0].xyzw
2: TEX OUT[0].xy, TEMP[2], SAMP[0], SHADOW2D
3: MOV TEMP[2].xy, IN[0].yxzz
4: TEX OUT[0].zw, TEMP[2], SAMP[0], SHADOW2D
5: END
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Always insert extra mov's for the tex coord into the fanin. This
simplifies things a bit, and avoids a scenario where multiple sam
instructions can have mutually exclusive input's to it's fanin, for
example:
1: TEX OUT[0].xy, IN[0].xyxx, SAMP[0], 2D
2: TEX OUT[0].zw, IN[0].yxxx, SAMP[0], 2D
The CP pass can always remove the mov's that are not actually needed,
so better to start out with too many mov's in the front end, than not
enough.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
dscis -> noscis
dbypass -> nobypass
a bit more consistant w/ nobin, etc. And IMO a bit more sensible names.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kills get added to the outputs list, to ensure they get scheduled. But
they aren't *really* outputs so skip them in the header comment block.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adding this makes 'make check' catch failures introduced from
within ARB_clip_control.xml earlier.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In order to test compiler changes more easily, spit out the assembled
shader with some header information so that we can know about
inputs/outputs more easily.
See: git://people.freedesktop.org/~robclark/ir3test
In ir3test we have a big collection of tgsi shaders and reference
ir3_compiler outputs. When making compiler changes, regenerate the
compiler outputs and feed to ir3test to compare the new vs reference
shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The last few dwords were skipped if the total number of dwords was not a
multiple of 4. Change the formatting for better readability.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We only get here if the VS/GS compiled programs change, but we can even
skip it if the VS/GS size didn't change.
Affects cairo runtime on glamor by -1.26471% +/- 0.674335% (n=234)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
brw_program.c: In function 'brw_dump_ir':
brw_program.c:566:33: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Originally I just fixed some unused parameter warnings in this
function. However, Ken pointed out:
"You could instead remove this driver hook. If the dd pointer is
NULL, arbprogram.c will return true. I think I'd prefer that."
Way, way back in time, I think _mesa_GetProgramivARB had the opposite
behavior. Given that it works the way it now works, I also prefer
removing the driver hook.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/uniform_query.cpp:1062:1: warning: unused parameter 'ctx' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Silences:
../../src/mesa/main/shaderobj.c: In function '_mesa_init_shader_program':
../../src/mesa/main/shaderobj.c:239:46: warning: unused parameter 'ctx' [-Wunused-parameter]
For now, this adds a couple other unused parameter warnings, but future
patches will clean those up.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_link_shader_program already calls _mesa_clear_shader_program_data
before calling link_shaders, so this is already done.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
_mesa_UseShaderProgramEXT, _mesa_ActiveProgramEXT, and
_mesa_CreateShaderProgramEXT were all removed when support for
GL_EXT_separate_shader_objects was removed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just remove the parameter. Silences:
../../src/mesa/main/ff_fragment_shader.cpp:668:1: warning: unused parameter 'p' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were allowing the register allocation code to do the
computation for us in ra_set_finalize. However, the runtime for this
computation is O(c^4 * g) where c is the number of classes and g is the
number of GRF registers. However, these q-values are directly computable
based on the way we lay out our register classes so there is no need for
the aweful runtime algorithm.
We were doing ok until commit 7210583eb where we bumped the number of
register classes from 11 to 16. While startup times don't normally matter,
this caused piglit to take 4 times as long to run on Bay Trail. This patch
should make generating the ra_set much faster and melt the piglit run
times.
v2: Fixed a couple of bugs. I have now verified that the same q-values are
generated both ways.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On older GENs in SIMD16 mode, we were accidentally building too much
interference into our register classes. Since everything is divided by 2,
the reigster allocator thinks we have 64 base registers instead of 128.
The actual GRF mapping still needs to be doubled, but as far as the ra_set
is concerned, we only have 64. We were accidentally adding way too much
interference.
Signed-off-by: Jason Ekstrand <jason.ekstrand@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For GEN6 SIMD16 mode, we have to 2-align all the registers, so we only have
the even-numbered ones. This means that we have to divide the register
number by 2 when we precolor. This wasn't a problem before because we were
setting up the interference between ra_node registers wrong. This will be
fixed in the next commit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This shouldn't be a functional change since reg_belongs_to_class is just a
wrapper around BITSET_TEST. It just makes the code a little easier to
read.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium should be prepared fine for ARB_clip_control.
So enable this and mention it in the release notes.
v2:
Only enable for drivers announcing the freshly introduced
PIPE_CAP_CLIP_HALFZ capability.
v3:
Use extension enable infrastructure to connect PIPE_CAP_CLIP_HALFZ
with ARB_clip_control.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
In preparation of ARB_clip_control. Let the driver decide if
it supports pipe_rasterizer_state::clip_halfz being set to true.
v3:
Initially enable on ilo.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de
Restore clip control to the default state if MESA_META_VIEWPORT
or MESA_META_DEPTH_TEST is requested.
v3:
Handle clip control state with MESA_META_TRANSFORM.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Implement the mesa parts of ARB_clip_control.
So far no driver enables this.
v3:
Restrict getting clip control state to the availability
of ARB_clip_control.
Move to transformation state.
Handle clip control state with the GL_TRANSFORM_BIT.
Move _FrontBit update into state.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is for preparation of ARB_clip_control.
v3:
Add comments.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This allows vc4_opt_cse.c to CSE-away operations involving the same
uniform values.
total instructions in shared programs: 37341 -> 36906 (-1.16%)
instructions in affected programs: 10233 -> 9798 (-4.25%)
total uniforms in shared programs: 10523 -> 10320 (-1.93%)
uniforms in affected programs: 2467 -> 2264 (-8.23%)
This saves a bunch of extra flushes when texsubimaging a whole texture
that's been used for rendering, or subdataing a whole BO. In particular,
this massively reduces the runtime of piglit texture-packed-formats (when
the probes have been moved out of the inner loop).
I'm going to be using VC4_DEBUG=shaderdb,norast to do shaderdb stats, but
when debugging regressions, I want to match shaderdb output to shader
disassembly.
fd_bo_cpu_prep() doesn't realize the bo is already referenced in
unflushed cmdstream. It could be made to do so (but would have to be
implemented twice, ie. both for msm and kgsl). But we still can't do
the expected thing if the caller isn't using _NOSYNC. Because of the
way the tiling works, we need to build quite a bit of cmdstream at flush
time, which is not possible to do at the libdrm level.
So rather than trying to make fd_bo_cpu_prep() smarter than it can
possibly be, just *always* discard and reallocate if the
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag is set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
- Use macro for munmap under Android - the STATIC_ASSERT uses
a off_t which is not used under Android for mmap. As loff_t size
does not vary as does off_t just ignore the assert.
- Wrap the long lines to improve readability.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The setMCJITMemoryManager method doesn't exist in LLVM 3.3.
I thought I had tested the latest version of my earlier change with LLVM
3.3, but it looks I missed it.
Trivial.
JITMemoryManager was removed in LLVM 3.6, and replaced by its base class
RTDyldMemoryManager.
This change fixes our JIT memory managers specializations to derive from
RTDyldMemoryManager in LLVM 3.6 instead of JITMemoryManager.
This enables llvmpipe to run with LLVM 3.6.
However, lp_free_generated_code is basically a no-op because there are
not enough hook points in RTDyldMemoryManager to track and free the code
of a module. In other words, with MCJIT, code once created, stays
forever allocated until process destruction. This is not speicfic to
LLVM 3.6 -- it will happen whenever MCJIT is used regardless of version.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We'll need to update gallivm for the interface changes in LLVM 3.6, and
the fewer the number of older LLVM versions we support the less hairy that
will be.
As consequence HAVE_AVX define can disappear. (Note HAVE_AVX meant
whether LLVM version supports AVX or not. Runtime support for AVX is
always checked and enforced independently.)
Verified llvmpipe builds and runs with with LLVM 3.3, 3.4, and 3.5.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The functionality exposed by those extensions does not appear in ES3
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the user calls util_blitter_cache_all_shaders() set a flag and assert
that we never try to create any new fragment shaders after that point.
If the assertions fails, it means we missed generating some shader in
util_blitter_cache_all_shaders().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g6<1>D g1<0,4,1>F 0F
cmp.nz.f0(8) null g6<4,4,1>D 0D
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g1<0,4,1>F 0F
(+f0) sel(8) g5<1>F g1.4<0,4,1>F g2<0,4,1>F
No difference in shader-db.
v2: Remember to delete the old code (thanks Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using dst_reg(this, ir->type) automatically sets the writemask to the
proper size for the type; src_reg(dst_reg) preserves that. This should
be equivalent, but less code.
Note that src_reg(dst_reg) either uses SWIZZLE_XXXX or SWIZZLE_XYZW, so
the old code did need the manual writemask adjustment, since it
constructed the registers the other way around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Nothing uses the vector_elements temporary variable.
Setting this->result.file is dead because we overwrite this->result a
few lines later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated an extra CMP instruction:
cmp.ge.f0(8) g4<1>D g2<0,1,0>F 0F
cmp.nz.f0(8) null g4<8,8,1>D 0D
(+f0) sel(8) g120<1>F g2.4<0,1,0>F g3<0,1,0>F
The first operand is always a boolean, and we want to predicate the SEL
on that. Rather than producing a boolean value and comparing it against
zero, we can just produce a condition code in the flag register.
Now we generate:
cmp.ge.f0(8) null g2<0,1,0>F 0F
(+f0) sel(8) g124<1>F g2.4<0,1,0>F g3<0,1,0>F
total instructions in shared programs: 5473459 -> 5473253 (-0.00%)
instructions in affected programs: 6219 -> 6013 (-3.31%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A while back, Matt made the uniform upload functions simply upload
ctx->Const.UniformBooleanTrue for boolean values instead of 0/1, which
removed the need to convert it later. We also set UniformBooleanTrue to
1.0f for drivers which want to treat booleans as 0.0/1.0f.
Nothing ever sets these, so they are dead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't have a scissor enable bit in hw, so when a raster state change
results in scissor enable bit changing, we need to also mark scissor
state as dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The optimization of avoiding restore (mem2gmem) if there was a clear
falls down a bit if you don't have a fullscreen scissor. We need to
make the decision logic a bit more clever to keep track of *what* was
cleared, so that we can (a) completely skip mem2gmem if entire buffer
was cleared, or (b) skip mem2gmem on a per-tile basis for tiles that
were completely cleared.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
DataLayoutPass was added in LLVM 3.5 r202168, commit
57edc9d4ff1648568a5dd7e9958649065b260dca "Make DataLayout a plain
object, not a pass.".
This patch fixes this build error with LLVM 3.4.
CXX llvm/libclllvm_la-invocation.lo
llvm/invocation.cpp: In function 'void {anonymous}::optimize(llvm::Module*, unsigned int, const std::vector<llvm::Function*>&)':
llvm/invocation.cpp:324:18: error: expected type-specifier
PM.add(new llvm::DataLayoutPass(mod));
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85189
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The values are hardcoded in the LLVM backend, but the TGSI definitions are
going to be changed with tessellation, e.g. TGSI_PROCESSOR_COMPUTE will be
increased by 2.
We'll use VS for LS and HS, because there's nothing special about them
from the LLVM backend point of view, even though the hardware side is
different. We do the same for ES.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I'll need indexed loads without the meta data flag for tessellation later.
Also rename load_const to buffer_load_const to distinguish it from indexed
const loads.
v2: add comments
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
We must convert it to boolean from the DX9 float encoding that Gallium
specifies.
Later, we should probably define that FACE should be 0 or ~0 if native
integers are supported.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
With 5 shader stages and various combinations of enabled and disabled shaders,
the maximum number of outputs in one shader doesn't have to be equal to
the maximum number of inputs in the following shader.
v2: return 32 for softpipe and llvmpipe
If the writemask doesn't compress, then we want to put in the uncompressed
writemask, not the compressed writemask failure value (all-on).
Fixes glean's stencil2 and fbo-clear-formats on stencil.
It seems like the hardware is unhappy if we execute a kill instruction
prior to last input (ei). Probably the shader thread stops executing
and the end-input flag is never set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If app only updates (for example) vertex uniforms, it would be nice to
only re-emit those and not also frag uniforms. Means we need to mark
the first frag shader const buffer dirty after a clear.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The get_variable_being_redeclared() function can free the 'var' argument.
Thereafter, we cannot assume that 'var' is a valid pointer. This patch
replaces 'var->name' with 'earlier->name' in two places and calls
is_gl_identifier(var->name) before 'var' might get freed.
This fixes several piglit GLSL crashes, including:
spec/glsl-1.50/execution/geometry/clip-distance-in-param
spec/glsl-1.50/execution/geometry/clip-distance-bulk-copy
spec/glsl-1.50/compiler/gs-redeclares-pervertex-out-before-global-redeclaration.geom
I'm not sure why these were not spotted sooner.
A similar bug was previously fixed by f9cecca7a.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Patch fixes 'glsl-2types-of-textures-on-same-unit' in WebGL conformance
test suite. No Piglit regressions, fixes gl-2.0-active-sampler-conflict.
To avoid adding potentially heavy check during draw (valid_to_render),
check is done during uniform updates by inspecting TexturesUsed mask.
A new boolean variable is introduced to cache validation state.
v2: take into account case where 2 uniforms use same unit (curro)
also do the check only when SSO is not in use, SSO has own
path for sampler validation.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Patch removes old variable based logic for handling a break inside
switch. Switch is put inside a loop so that existing infrastructure
for loop flow control can be used for the switch, now also dead code
elimination works properly.
Possible 'continue' call inside a switch needs now special handling
which is taken care of by detecting continue, breaking out and calling
continue for the outside loop.
v2: remove one unnecessary ir_expression (Curro)
Fixes following Piglit tests:
fs-exec-after-break.shader_test
fs-conditional-break.shader_test
No Piglit or es3conform regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLES2 doesn't have GL_TEXTURE_BASE_LEVEL, so the hardware doesn't. Fixes
piglit levelclamp, tex-miplevel-selection, and texture-storage/2D mipmap
rendering.
Suppresses a bunch of warning noise about sample_map possibly being used
uninitialized.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Before, we used the a signed d-word for booleans and the immedates we
emitted varried between signed and unsigned. This commit changes the type
to unsigned (I think that makes more sense) and makes immediates more
consistent. This allows copy propagation to work better cleans up some
instructions.
total instructions in shared programs: 5473519 -> 5465864 (-0.14%)
instructions in affected programs: 432849 -> 425194 (-1.77%)
GAINED: 27
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_SampleID is a built-in variable that always is of type "int".
Suggested by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Notably this included the EOF flag (the other bits are the full buffer
dump selection, but we don't do full dumps), which caused the kernel
checking for frame completion to trigger.
The other driver does this manually before calling into each tile, but we
can just let it get binned into the tiles (saving repeated kernel
validation on the packet).
Fixes simulator assertion failures on polygon-mode and non-auto texwrap.
We don't need to emit all of our current state at the end of each bin
list. We're going to be smashing it all at the start of the next tile's
bin list, anyway.
The trick is to generate a unique buffer usage value for each possible
combination of domains and flags, with only one bit set each for the
domains and flags. This ensures pb_check_usage() only returns TRUE when
the domains and flags the cached buffer was created for exactly match
the requested ones.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are two debug variables:
CLOVER_DEBUG which you can set to any combination of llvm,clc,asm
(separated by commas) to dump llvm IR, OpenCL C, and native assembly.
CLOVER_DEBUG_FILE which you can set to a file name for dumping output
instead of stderr. If you set this variable, the output will be split
into three separate files with different suffixes: .cl for OpenCL C,
.ll for LLVM IR, and .asm for native assembly. Note that when data
is written, it is always appended to the files.
v2:
- Code cleanups
- Add CLOVER_DEBUG_FILE environment variable for dumping to a file.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2:
- Split build_module_native() into three separate functions.
- Code cleanups.
v3:
- More cleanups.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Drivers can return this value for PIPE_COMPUTE_CAP_IR_TARGET
if they want clover to give them native object code.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
NewBufferObject took a "target" parameter, which it blindly passed to
_mesa_initialize_buffer_object(), which ignored it.
Not much point in passing it around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is used to implement GLSL's atomicCounter() intrinsic. Previously
it *worked*, but the disassembly was bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This would have *almost never* actually been an issue, since other state
tends to get flagged at the same time as new ABOs -- but still bogus.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This didn't make any sense, but papered over the missing TexBO flagging
we've just fixed, in a bunch of cases.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
When a buffer object is bound to one of the indexed uniform buffer
binding points, assume that from that point on it may be used as
a uniform buffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In the drivers, we occasionally want to reallocate the backing
store for a buffer object; often to avoid waiting for the GPU
to be finished with the previous contents.
At the point that happens, we don't have a good way of determining
where else the buffer object may be bound, and so no good way of
determining which dirty flags need to be raised -- it's fairly
expensive to go looking at all the possible binding points.
Until now, we've considered any BO to be possibly bound as a UBO or
TexBO, and flagged all that state to be reemitted.
Instead, remember what kinds of binding point this buffer has ever
been used with, so that the drivers can flag only what they need.
I don't expect these bits to ever be reset, but that doesn't matter
for reasonable apps.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The kernel files are built into a separate static library and
all the functions that require it are already wrapped in ifdef
USE_VC4_SIMULATOR. Don't forget the header file :)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we've made all the texture emit code mostly independent of GLSL
IR, this isn't necessary any more.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Before, we had 3 different emit functions for various different gen's,
as well as some ancilliary work that was the same across all gen's which
was either contained in functions or duplicated across the GLSL IR and
Mesa IR backends. Now, we have a single method, emit_texture(), that
takes all the information needed to make a texture instruction and
handles all the setup, and all we have to do to emit a texture
instruction while converting from GLSL IR, Mesa IR, or any new backend
is to extract the information emit_texture() needs and then call it.
v2: Significant rebasing (by Ken).
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This happened to work before, but it would convert the output to a float
and then back to an integer which seems bad.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This drops a dependency on ir_texture objects.
v2 (Ken): Rename lod_components to grad_components, as it only has a
meaningful value for ir_txd. We could set it to 1 for TXL,
but there's no real need.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Our new IR won't have ir_texture objects, but using glsl_type is fine.
v2 (Ken): Drop redundant ir->coordinate NULL check; rebase.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is slightly clearer. Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This moves the handling of non-constant texel offset subexpression trees
to the place where we visit other such subtrees. It also removes some
uses of ir->offset in emit_texture_gen7, which will be useful when we
write the backend for our new upcoming IR.
Based on a patch by Connor Abbott.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
brw_lower_unnormalized_offset sets ir->offset to NULL if it applies the
texelFetchOffset workarounds, so there's no need to special case it
here---there won't be an offset for ir_txf.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Eric's original code to work around TXF offset bugs contained a comment
explaining the problem, which was lost when Chris generalized it to an
IR transformation (in commit 598ca510b8).
This commit adds the original comment to the newer code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because we reuse various bits of emit code (for state/vertex/prog/etc)
for both regular draws and internal draws (gmem<->mem, clear, etc), the
number of parameters getting passed around has been growing. Refactor
to group these into fd3_emit. This simplifies fxn signatures, avoids
passing around shader key on the stack, etc. It also gives us a nice
place to cache shader-variant lookup to avoid looking up shader variants
multiple times per draw (without having to *also* pass them around as
fxn args everywhere).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws. Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.
Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+
math instruction: it's a single instruction (SEND) with a GRF source.
The difference is that it also implicitly clobbers a message register.
The only visible effect is that CSE will remove the MRF-clobbering from
later math operations. This should be fine; compute_to_mrf and
remove_redundant_mrf_writes don't look at the values populated by
implied writes, so they can't rely on those values being present.
Less interference may actually help those passes make more progress.
Binary math is still problematic, since it involves a separate MOV
instruction to load the second operand. We continue disallowing CSE for
binary math operations.
total instructions in shared programs: 3340303 -> 3340100 (-0.01%)
instructions in affected programs: 26927 -> 26724 (-0.75%)
Nothing hurt, gained, or lost. ~6% reduction on a few shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Otherwise the caching buffer manager may return a buffer which was created
with a different set of flags, which can cause trouble.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, when no pkg-config was available for
libexpat we would just add the needed linking
flags without any extra check.
Now, we check that the library and the headers are
also installed in the building environment.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Now that the freedreno_lowering code is moved to tgsi_lowering, remove
our private copy and switch over to using the common version.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Originally the variables were set only once via the ?= operator but
that causes issues when doing incremental builds. They appear to be
undefined and missing from the dependency list despite their addition
to LIBADD.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84807
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The texturing hardware takes the POT level 0 width/height and minifies
those. This is different from what we were doing, for example, for
273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16.
Fixes piglit-depthstencil-render-miplevels 273.
This patch fixes this build error on DragonFly BSD.
CC os/os_misc.lo
os/os_misc.c: In function 'os_get_total_physical_memory':
os/os_misc.c:132:2: error: #error Unsupported *BSD
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation of glxUseXFont requires creating
a temporary pixmap and graphics context, which requires a real
old-school X11 Window, not a glxDrawable. This patch changes
things so that glxUseXFont will also accept a glxWindow or
glxPixmap, and lookup the underlying X11 Drawable. Without
this patch glxUseXFont generates a giant stream of Xerrors
about bad drawables and bad graphics contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
It does not look like an issue now but it is good to be future proof. Spotted
by Courtney Goeltzenleuchter.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This gets them to pass glsl-sin/cos. There was an obvious problem that I
was using the FRC code on the scaled input value, which means that we had
a range in [0, 1], while our taylor is most accurate across [-0.5, 0.5].
We can just slide things over, but that means flipping the sign of the
coefficients. After that, it was just a matter of stuffing more
coefficients in.
There's no reason to stall on pwrite - the CPU always appends to the
buffer and never modifies existing contents, and the GPU never writes
it. Further, the CPU always appends new data before submitting a batch
that requires it.
This code predates the unsynchronized mapping feature, so we simply
didn't have the option when it was written.
Ideally, we would do this for non-LLC platforms too, but unsynchronized
mapping support only exists for LLC systems.
Saves a bunch of stall avoidance copies when uploading shaders.
v2: Rebase on changes to previous patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
We don't really want unnecessary buffer copying, so it'd be nice to know
when it's happening.
v2: Drop stall warnings when doing a read-only CPU mapping of the cache
BO. The GPU also uses it in a read-only fashion, so there won't be
any stalls, even though the buffer is busy. (Thanks to Chris Wilson
for catching this mistake.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
This is easy: we just need to use brw_map_bo instead of mapping it
directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding. And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).
Fixes 77 piglit tests.
Also fixes two sided lighting which was broken at least
on pre-evergreen by commit b1eb00.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is the last use tgsi_parse_token in radeonsi.
It looks ugly because the code was re-indented, but there is really no change
in behavior.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
They were reinventing tgsi_shader_info. They are unused now.
radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.
If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.
This is a prerequisite for the following commit.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
[ Francisco Jerez: Split off from a larger patch, and take a slightly
different approach for passing the implicit arguments around. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.
This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:
atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)
This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).
The polynomial that approximates atan(x) is:
x * 0.9999793128310355 - x^3 * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7 * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444
This polynomial was found with the following GNU Octave script:
x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)
The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:
http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m
This fixes the following piglit test:
shaders/glsl-const-folding-01
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping. Otherwise, we will most likely
crash.
Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases. libdrm probably has a hand in that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Copy propagating these might result in reading the r4 after some other
instruction has written r4. Just prevent all copy propagation of this for
now.
Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.
Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together). What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.
I was about to make the situation worse with indirect uniform buffer
access.
I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea. In the process, this fixes a swap of the s/t texture wrap modes.
Under the simulator, reading registers before writing them triggers an
assertion failure. c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction. We should
definitely not be aborting in this case, and return some sort of undefined
value instead.
Fixes glsl-user-varying-ff.
The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.
In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer. Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached. I originally chose WT for render targets because it works in
all cases. However, we really want to use write-back caching where
possible, as it is more efficient.
Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well. So
in most cases WB will work. However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.
This matches our MOCS choice on Haswell.
Fixes performance regressions since commit ee4484be3d
in a microbenchmark (spotted by Eero Tamminen). Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2. Improves performance in a bunch of other microbenchmarks
by ~15% or so.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.
This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The non-base NPOT levels are stored as POT-aligned images. We get that
POT alignment by minifying the POT-aligned base level.
This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.
Fixes the fbo-generatemipmap-formats NPOT cases. Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.
It's fairly easy, thanks to Rob Clark's lowering code. Fixes
two-sided-lighting and 4 vertex-program-two-side testcases, while
regressing 8 testcases that involve enabling two-sided color while only
initializing one of the two colors in the VS. If you're enabling two
sided color, it's of course expected that you really do set up both
colors, so this is still an improvement (and when we set up a linker for
TGSI, we'll hopefully fix those 8 fails).
Lots of drivers need to transform the weird instructions in TGSI into
reasonable scalar ops, and this code can make those translations
canonical.
Acked-by: Rob Clark <robclark@freedesktop.org>
The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything. By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).
Original patch by Petri Latvala <petri.latvala@intel.com>:
Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.
This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.
This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:
total instructions in shared programs: 75 -> 67 (-10.67%)
instructions in affected programs: 60 -> 52 (-13.33%)
GAINED: 0
LOST: 0
All tests (max3, min3, mid3) improved.
A full shader-db run:
total instructions in shared programs: 4293603 -> 4293575 (-0.00%)
instructions in affected programs: 1188 -> 1160 (-2.36%)
GAINED: 0
LOST: 0
Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of
dropping of a (constant float (0.00000)) operand, which was
compiled to a saturate modifier.
Version 2 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
(Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
(Suggested by Mat Turner)
- Change mixmax_range to call its limits "low" and "high" instead of
"range[0]" and "range[1]". (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
(Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
to prune its children. (Suggested by Connor Abbot).
Other changes:
- Instead of having a global "valid" visitor member, let the various functions
that can determine this status return a boolean and check for its value
to decide what to do in each case. This is more flexible and allows to
recurse into children of parents that could not be prunned due to invalid
ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
any use of get_range, combine_range or range_intersection can invalidate
a range we should check for this situation every time we use any of these
functions.
Version 3 by Iago Toral Quiroga <itoral@igalia.com>:
Changes from review feedback:
- Now we can make get_range, combine_range and range_intersection static too
(suggested by Connor Abbot).
- Do not return NULL when looking for the larger or greater constant into
mixed vector constants. Instead, produce a new constant by doing a
component-wise minmax. With this we can also remove of the validations when
we call into these functions (suggested by Connor Abbot).
- Add a comment explaining the meaning of the baserange argument in
prune_expression (suggested by Connor Abbot).
Other changes:
- Eliminate minmax expressions operating on constant vectors with mixed values
by resolving them.
No piglit regressions observed with Version 3.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Patch fixes following test case from 'shaders-with-varyings' WebGL
conformance suite: "vertex shader with unused varying and fragment
shader with used varying must succeed"
v2: emit still a warning if the condition happens (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When a shader needs N surfaces, we should upload N surfaces and not depend on
how many are bound. This commit is larger than it should be because we did
not export how many surfaces a surface uses before.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It only needs the constant buffer with clip planes and read-write resources
for the GS->VS ring and streamout. That's 2 pointers.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a wrong place to flush caches to say the least.
I don't think we need to flush the instruction caches if we don't patch
shaders with DMA.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
st/mesa has the same flag in its shader key, we don't need to do it
in the driver anymore.
Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The first compiled shader is sometimes useless, because the key doesn't match
the key for the draw call where it's used.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes a few issues, including a potential empty-IB (which triggers gpu
hangs in piglit occlusion_query_meta_no_fragments)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Possibly we should map the front color to black (zeroes). But not sure
there is a way to do that without generating a shader variant.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL IN[0], GENERIC[0], PERSPECTIVE
DCL OUT[0], COLOR
DCL SAMP[0]
DCL TEMP[0], LOCAL
IMM[0] FLT32 { 0.0000, 1.0000, 0.0000, 0.0000}
0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
1: MOV OUT[0], IMM[0].xyxx
2: END
cause unhappyness. They have an IN[], but once this is compiled the
useless TEX instruction goes away. Leaving a varying that is never
fetched, which makes the hw unhappy.
In the process fix a signed vs unsigned compare. If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments. Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.
The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.
Plus, improve the comments for _mesa_alloc_dispatch_table().
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
This isn't perfect -- the filtering is happening on the srgb values, and
we're decoding afterwards, which is not what you want. I think that's the
cause of some additional texwrap(GL_CLAMP, LINEAR) failures, though many
other texwrap tests on srgb start to pass since unfiltered values come out
correct.
Since the RA has to be done s.t. each one gets its own (adjacent)
register, it would complicate matters if instructions were allowed to be
repeated. This enables copy-propagation use in situations where
previously that might have happened.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
While running piglit in virgl, I hit an assert in intel driver.
"qemu-system-x86_64: intel_tex.c:219: intel_map_texture_image: Assertion `tex_image->TexObject->Target != 0x8C18 || h == 1' failed."
Thanks to Eric and Ken for pointing me in the right direction,
Fix the get_tex_depth to do the same fixup as get_tex_rgba does
for 1D array textures.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
With current makefiles the build fails because source and build paths
are generated incorrectly. With Android build system the top_srcdir and
top_builddir variables are undefined and all paths are relative to where
Android.mk is located. This ends up with path likes
external/mesa/src/mesa/src/mesa/ for both source and build paths, which
are obviously wrong.
This patch fixes this by overriding resulting SRCDIR and BUILDDIR
variables with empty string, so that paths end up being relative to
Android.mk file again. Appending correct build path to generated files
is already done in Android.gen.mk.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This patch fixes Android build failures by including src/util directory
in compilation. Files inside of this directory are compiled into
libmesa_util static library and linked with resulting libGLES_mesa.
Signed-off-by: Tomasz Figa <tomasz.figa@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Before, we were hard-coding the base_mrf based on dispatch width not number
of registers spilled at a time. This caused us to emit instructions with a
base_mrf or 14 and a mlen of 3 so we used the magical non-existant m16
register. This fixes the problem.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we had a MAX_SAMPLER_MESSAGE_SIZE which we used instead.
However, some FB write messages can validly be longer than this so we need
something different. Since MAX_SAMPLER_MESSAGE_SIZE is validly useful on
its own, we leave it alone and add a new MAX_GRF_SIZE that's big enough for
FB writes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84539
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes a bug where 1-wide operations don't properly translate down to
1-wide instructions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Above a certain limit use CACHE mode instead of BUFFER mode. This
should solve gpu hangs with large shader programs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The existing code was not setting several fields, most importantly the
target, which is required on nv50/nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patch fixes failing test in WebGL conformance test
'point-no-attributes' when running Chrome on OpenGL ES.
(Shader program may draw points using constant data in shader.)
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They're actually as documented in the HW specs and the GL mipmapping enums
order. Fixes fbo-generatemipmap-filtering , and some other tests where we
were off by a few bits due to unexpected linear filtering.
Extension enables doing a multisample buffer resolve and buffer
scaling using a single glBlitFrameBuffer() call. Currently, we
have this extension implemented in BLORP which is only used by
SNB and IVB. This patch implements the extension in meta path
which makes it available to Broadwell.
Implementation features:
- Supports scaled resolves of 2X, 4X and 8X multisample buffers.
- Avoids unnecessary shader compilations by storing the pre compiled
shaders for each supported sample count.
- Uses bilinear filtering for both GL_SCALED_RESOLVE_FASTEST_EXT and
GL_SCALED_RESOLVE_NICEST_EXT filter options. This is an allowed
behavior in the extension's spec.
- I tried doing bicubic filtering for GL_SCALED_RESOLVE_NICEST_EXT
filter. It made the edges in the image look little smoother but
the image gets blurred causing no overall quality improvement.
For now I have dropped the idea of doing different filtering for
nicest filter.
V2:
- Minor changes to simplify the fragment shader.
- Refactor the code to move i965 specific sample_map computation out
of Meta. We now use ctx->Const.SampleMap{2,4,8}x variables initialized
by the driver.
- Use a simple msaa resolve shader for scaled resolves with scaling
factor = 1.0.
V3:
- Make changes to create a string out of ctx->Const.SampleMap{2,4,8}x
variables and use it in fragment shader.
V4:
- Make changes to use uint8_t type ctx->Const.SampleMap{2,4,8}x
variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
with values specific to Intel hardware.
V2: Define and use gen6_get_sample_map() function to initialize
the variables.
V3: Change the function name to gen6_set_sample_maps() and use
memcpy() to fill in the data.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
SampleMap{2,4,8}x variables are used in later patches to implement
EXT_framebuffer_multisample_blit_scaled extension.
V2: Use integer array instead of a string.
Bump up the comment.
V3: Use uint8_t type array.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch implements functions for images support,
which basically supports copy data between video
surface and user buffers, in this case supports
SW decode, and other video output
v2: fix buffer size for odd-sized image case
expose I420 format as well
v3: fix YUV 4:2:2 format data buffer size
cleanup I420 format exposure
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements codec for mpeg2 h264 and vc1,
populates codec parameters and pass them to HW driver.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch implements context managements, relate it HW driver,
functions for video surface managements, and functions for
application data memory buffer managements.
implemented functions:
vlVa(Create|Destroy)Context
vlVa(Create|Destroy|Put)Surfaces
vlVa(Create|Destroy)Buffer
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch is for application to query configuration,
such as profiles, entrypoints, and attributes
v2: fix missing profile with query
Signed-off-by: Michael Varga <michael.varga@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
This patch adds a skeleton VA-API state tracker,
which is filled with live in the subsequent patches.
v2: fixes in configure.ac and va state_tracker Makefile.am
v3: do not link against libva.
detect libva version, and correctly set driver entrypoint name.
rebase(cleanup) targets/va/Makefile.am
v4: cleanup va version auto detection
add back targets/va/va.sym
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Break out these functions so that they can be shared with a other
state trackers. They will be used in subsequent patches for the new
VA-API state tracker.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cuts the number of i965 color calculator viewport uploads by 100x
(11017983 -> 113385) in 'x11perf -gc' with Glamor in Xephyr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I believe when I wrote this code, gen6_sf_state used CACHE_NEW_VS_PROG,
which has since been replaced by BRW_NEW_VUE_MAP_GEOM_OUT. It's not
needed here anyway - only SBE needs it. Just a copy and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function flagged BRW_NEW_*_PROGRAM
When ctx->{Vertex,Geometry,Fragment}Program._Current changes, core Mesa
calls the BindProgram driver hook, which flagged BRW_NEW_*_PROGRAM.
However, brw_upload_state also checks for that changing, sets the same
flags, and also updates brw->fragment_program and so on. So, this looks
to be entirely redundant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Now that the bitfield is a uint64_t, we should use 1ull. Currently, we
only have 32 entries, so 1 works fine, but it's not future-proof.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will keep INTEL_DEBUG=state working when we add BRW_NEW_* bits
beyond 1 << 31. We missed doing this when widening the driver flags
from uint32_t to uint64_t.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fix build error introduced with commit
eedbce9c63.
lp_test_format.c: In function ‘test_format_unorm8’:
lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’
gallivm = gallivm_create("test_module_unorm8");
^
In file included from ../../../../src/gallium/auxiliary/gallivm/lp_bld_format.h:38:0,
from lp_test_format.c:42:
../../../../src/gallium/auxiliary/gallivm/lp_bld_init.h:58:1: note: declared here
gallivm_create(const char *name, LLVMContextRef context);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84538
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Instead of just segfaulting in the driver when a buffer allocation fails,
report error messages indicating what went wrong so that we can debug things.
As a simple example, chromium wraps Mesa in a sandbox which doesn't allow
access to most syscalls, including the ability to create shared memory
segments for fences. Before, you'd get a simple segfault in mesa and your 3D
acceleration would fail. Now you get:
$ chromium --disable-gpu-blacklist
[10618:10643:0930/200525:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
libGL: pci id for fd 12: 8086:0a16, driver i965
libGL: OpenDriver: trying /local-miki/src/mesa/mesa/lib/i965_dri.so
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL: Can't open configuration file /home/keithp/.drirc: Operation not permitted.
libGL error: DRI3 Fence object allocation failure Operation not permitted
[10618:10618:0930/200525:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[10618:10618:0930/200525:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
This made it pretty easy to diagnose the problem in the referenced bug report.
Bugzilla: https://code.google.com/p/chromium/issues/detail?id=415681
Signed-off-by: Keith Packard <keithp@keithp.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.
When vblank_mode=0, if there are only three buffers we do:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC 2.
Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.
With four buffers, we get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting down in the kernel waiting for vblank to
become the next scanout buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
This cannot be displayed at MSC 1 because the
kernel doesn't have any way to replace buffer 1 as the pending
scanout buffer. So, best case this will get displayed at MSC
2. The X server will queue this swap until buffer 1 becomes
the scanout buffer.
Render frame 3 to buffer 3
PresentPixmap for buffer 3 at MSC 1
As soon as the X server sees this, it will replace the pending
buffer 2 swap with this swap and release buffer 2 back to the
application
Render frame 4 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Now we're in a steady state, flipping between buffer 2 and 3
waiting for one of them to be queued to the kernel.
...
current scanout buffer = 1 at MSC 1
Now buffer 0 is free and (e.g.) buffer 2 is queued in
the kernel to be the scanout buffer at MSC 2
Render frames, flipping between buffer 0 and 3
When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:
current scanout buffer = 0 at MSC 0
Render frame 1 to buffer 1
PresentPixmap for buffer 1 at MSC 1
This is sitting waiting for vblank to become the next scanout
buffer
Render frame 2 to buffer 2
PresentPixmap for buffer 2 at MSC 1
Queue this for display at MSC 1
1. There are three possible results:
1) We're still before MSC 1. Buffer 1 is released,
buffer 2 is queued waiting for MSC 1.
2) We're now after MSC 1. Buffer 0 was released at MSC 1.
Buffer 1 is the current scanout buffer.
a) If the user asked for a tearing update, we swap
scanout from buffer 1 to buffer 2 and release buffer
1.
b) If the user asked for non-tearing update, we
queue buffer 2 for the MSC 2.
In all three cases, we have a buffer released (call it 'n'),
ready to receive the next frame.
Render frame 3 to buffer n
PresentPixmap for buffer n
If we're still before MSC 1, then we'll ask to present at MSC
1. Otherwise, we'll ask to present at MSC 2.
Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.
I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.
That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.
Signed-off-by: Keith Packard <keithp@keithp.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Need to unwrap the indirect resource otherwise bad things will happen.
Fixes random crashes and timeouts with piglit's arb_indirect_draw tests.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
IVB had a restriction that prevented us from emitting compressed
three-source instructions, and although that was lifted on Haswell,
Haswell had a new restriction that said BFI instructions specifically
couldn't be compressed.
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
The improvement here is that we've broken a dependency between these
instructions. Leads to 330 fewer INV instructions and 330 more RSQ.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Transform
sqrt a, b
rcp c, a
into
sqrt a, b
rsq c, b
In most cases the sqrt's result is still used, so the improvement here
is that we've broken a dependency between these instructions. Leads to
80 fewer INV instructions and 80 more RSQ.
Occasionally the sqrt's result is no longer used, leading to:
instructions in affected programs: 5005 -> 4949 (-1.12%)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The next patch adds an algebraic optimization for the pattern
sqrt a, b
rcp c, a
and turns it into
sqrt a, b
rsq c, b
but many vertex shaders do
a = sqrt(b);
var1 /= a;
var2 /= a;
which generates
sqrt a, b
rcp c, a
rcp d, a
If we apply the algebraic optimization before CSE, we'll end up with
sqrt a, b
rsq c, b
rcp d, a
Applying CSE combines the RCP instructions, preventing this from
happening.
No shader-db changes.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Helps a handful of programs in Serious Sam 3 that use do-while loops.
instructions in affected programs: 16114 -> 16075 (-0.24%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Do not rely on LLVMMCJITMemoryManagerRef being available.
The c binding to the memory manager objects only appeared
on llvm-3.4.
The change is based on an initial patch of Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Blitter can still have transfers hanging around which it frees in
util_blitter_destroy(). So let it clean up before we yank the
transfer_pool from under it.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Indirect registers consume an additional token. Try to clean up the
token calculation math a bit, and fix it at the same time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the name is just going to get dropped, don't bother making it. If
the name is made, release it sooner (rather than later).
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,578,719,715 67,762,208 62,263,404 5,498,804 0
After (32-bit): 52 40,565,579,466 66,359,800 61,187,818 5,171,982 0
Before (64-bit): 74 37,129,541,061 95,195,160 87,369,671 7,825,489 0
After (64-bit): 76 37,134,691,404 93,271,352 85,900,223 7,371,129 0
A real savings of 1.0MiB on 32-bit and 1.4MiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These few places were using ir_var_auto for seemingly no reason. The
names were not added to the symbol table.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No change Valgrind massif results for a trimmed apitrace of dota2.
v2: Minor rebase on _mesa_init_constants changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Later patches will give every ir_var_temporary the same name in release
builds. Adding a bunch of variables named "compiler_temp" to the symbol
table can only cause problems.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specifically, ir_var_temporary variables constructed with a NULL name
will all have the name "compiler_temp" in static storage.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
After (32-bit): 71 40,583,408,411 67,761,528 62,263,519 5,498,009 0
Before (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
After (64-bit): 67 37,123,303,706 95,150,544 87,333,600 7,816,944 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
At least one of these pointers must be NULL, and we can determine which
will be NULL by looking at other fields. Use this information to store
both pointers in the same location.
If anyone can think of a better name for the union than "u", I'm all
ears.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 63 40,574,239,515 68,117,280 62,618,607 5,498,673 0
After (32-bit): 44 40,577,049,140 68,118,608 62,441,063 5,677,545 0
Before (64-bit): 53 37,126,451,468 95,150,256 87,711,304 7,438,952 0
After (64-bit): 63 37,122,829,194 95,153,008 87,333,600 7,819,408 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also move num_state_slots inside ir_variable_data for better packing.
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in a few more patches.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
warn_extension_index was moved to improve packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
After (32-bit): 73 40,575,751,558 68,116,528 62,618,607 5,497,921 0
Before (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
After (64-bit): 62 37,123,578,526 95,150,784 87,711,304 7,439,480 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
v2: Use the enum name with the bit-field and remove the extra casts.
Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Also move the new warn_extension_index into ir_variable::data. This
enables slightly better packing.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 82 40,580,040,531 68,488,992 62,973,695 5,515,297 0
After (32-bit): 73 40,580,476,304 68,488,400 62,796,151 5,692,249 0
Before (64-bit): 65 37,124,013,542 95,892,768 88,466,712 7,426,056 0
After (64-bit): 71 37,124,890,613 95,889,584 88,089,008 7,800,576 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The payoff for this will come in the next patch.
No change Valgrind massif results for a trimmed apitrace of dota2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
After compilation (and before linking) we can eliminate quite a few
built-in variables. Basically, any uniform or constant (e.g.,
gl_MaxVertexTextureImageUnits) that isn't used (with one exception) can
be eliminated. System values, vertex shader inputs (with one
exception), and fragment shader outputs that are not used and not
re-declared in the shader text can also be removed.
gl_ModelViewProjectMatrix and gl_Vertex are used by the built-in
function ftransform. There are some complications with eliminating
these variables (see the comment in the patch), so they are not
eliminated.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
After (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
Before (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
After (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
A real savings of 4.9MiB on 32-bit and 7.0MiB on 64-bit.
v2: Don't remove any built-in with Transpose in the name.
v3: Fix comment typo noticed by Anuj.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
All built-in uniforms are supposed to be backed by some GL state. The
state_slots field describes this backing state.
This helped me track down a bug in a later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reuse the LLVMContext already allocated in llvmpipe_context
for draw_llvm if ppossible. This should decrease the memory
footprint of an llvmpipe context.
v2: Fix compile with llvm disabled.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This fixes the remaining problem with the recently introduced
global jit memory manager. This change again uses a memory manager
that is local to gallivm_state. This implementation still frees
the majority of the memory immediately after compilation.
Only the generated code is deferred until this code is no longer used.
This change and the previous one using private LLVMContext instances
I can now safely run several independent OpenGL contexts driven
by llvmpipe from different threads.
v3: Rebase on llvm-3.6 compile fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This is one step to make llvmpipe thread safe as mandated by the OpenGL
standard. Using the global LLVMContext is obviously a problem for
that kind of use pattern. The patch introduces two LLVMContext
instances that are private to an OpenGL context and used for all
compiles. One is put into struct draw_llvm and the other
one into struct llvmpipe_context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
The big pile of patches I just pushed regresses about 25 piglit tests on
SNB. This fixes the regressions.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
The screen argument isn't actually used by lp_jit_screen_init() at this
time, but let's move the call so that we pass a valid pointer.
v2: don't leak screen if lp_jit_screen_init() fails.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For cube resources, the array_size value should be 6. So handle
that case as we do for array texture resources. But assert that
array_size==6 just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
layers (aka array_size) should be 6 for cube textures so we don't need
to special-case it. But add an assertion just to be safe.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Earlier in the function we assert layers==6 for PIPE_TEXTURE_CUBE so
there's no reason to special-case the pt.array_size = layers assignment.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The core sw primitive restart code is still around, because i965 uses it
in some cases, but there are no drivers that want it on all the time.
Reviewed-by: Rob Clark <robdclark@gmail.com>
The drivers not flagging primitive restart support are r300 swtcl, svga,
nv30, and vc4.
The point of primitive restart is to slightly reduce draw call overhead
for apps by batching multiple draws. If we do an extra pass to read the
index buffer and split back into multiple draws, we've entirely missed the
point. This is particularly bad for drivers that otherwise have hardware
IB reads, where the readback is probably uncached.
Reviewed-by: Rob Clark <robdclark@gmail.com>
On gen 7, the MRF was removed and we gained the ability to do send
instructions directly from the GRF. This commit enables that
functinoality for FB writes.
v2: Make handling of components more sane.
i965/fs: Force a high register for the final FB write
v2: Renamed the array for the range mappings and added a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we are going to use LOAD_PAYLOAD operations to fill MRF registers, then
we will need this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were use the base_mrf parameter of fs_inst to store the MRF
location. In preparation for doing FB writes from the GRF, we now also
allow you to set inst->base_mrf to -1 and provide a source register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have execution sizes, we can use that instead of the
dispatch width. This way it also works for 8-wide instructions in
SIMD16.
i965/fs: Make effective_width a variable instead of a function
i965/fs: Preserve effective width in constant propagation
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will, eventually, allow us to manage execution sizes of
instructions in a much more natural way from the fs_visitor level.
i965/fs: Explicitly set instruction execute size a couple of places
i965/blorp: Explicitly set instruction execute sizes
Since blorp is all 16-wide and nothing isn't, in general, very careful
about register width, we'll just set it all explicitly.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we track both halves of a 16-wide vgrf, we no longer need to worry
about force_sechalf or force_uncompressed. The only real issue is if the
destination is too small.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit fixes a bug in register coalesce that happens when one register
is moved to another the proper number of times but the channels are
re-arranged. When this happens, the previous code would happily coalesce
the registers regardless of the fact that the channel mappins were wrong.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that offset() can properly handle MRF registers, we can use an MRF
fs_reg and let offset() handle incrementing it correctly for different
dispatch widths. While this doesn't have any noticeable effect currently,
it does ensure that the destination register is 16-wide which will be
necessary later when we start detecting execution sizes based on source and
destination registers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is actually the squash of a bunch of different changes. Individual
commit titles follow:
i965/fs: Always 2-align registers SIMD16 for gen <= 5
i965/fs: Use the register width when applying offsets
This reworks both byte_offset() and offset() to be more intelligent.
The byte_offset() function now supports offsets bigger than 32. The
offset() function uses the byte_offset() function together with the
register width and the type size to offset the register by the correct
amount.
i965/fs: Change regs_read to be in hardware registers
i965/fs: Change regs_written to be actual hardware registers
i965/fs: Properly handle register widths in LOAD_PAYLOAD
The LOAD_PAYLOAD instruction is a bit special because it collects a
bunch of registers (with possibly different widths) into a single
payload block. Once the payload is constructed, it's treated as a
single block of data and most of the information such as register widths
doesn't matter anymore. In particular, the offset of any particular
source register is the accumulation of the sizes of the previous source
registers.
i965/fs: Properly set writemasks in LOAD_PAYLOAD
i965/fs: Handle register widths in demote_pull_constants
i965/fs: Get rid of implicit register doubling in the allocator
i965/fs: Reserve enough registers for PLN instructions
i965/fs: Make sources and destinations interfere in 16-wide
i965/fs: Properly handle register widths in CSE
i965/fs: Properly handle register widths in register_coalesce
i965/fs: Properly handle widths in copy propagation
i965/fs: Properly handle register widths in VARYING_PULL_CONSTANT_LOAD
i965/fs: Properly handle register widths and odd register sizes in spilling
i965/fs: Don't waste a register on texture lookups for gen >= 7
Previously, we were waisting a register in SIMD16 mode because we could
only allocate registers in pairs. Now that we can allocate and address
odd-sized registers, let's get rid of this special-case.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every register in i965 assembly implicitly has a concept of a "width".
Usually, this is derived from the execution size of the instruction.
However, when writing a compiler it turns out that it is frequently a
useful to have the width explicitly in the register and derive the
execution size of the instruction from the widths of the registers used in
it.
This commit adds a width field to fs_reg along with an effective_width()
helper function. The effective_width() function tells you how wide the
register effectively is when used in an instruction. For example, uniform
values have width 1 since the data is not actually repeated, but when used
in an instruction they take on the width of the instruction. However, for
some instructions (LOAD_PAYLOAD being the notable exception), the width is
not the same.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just pass the visitor into is_copy_payload() and is_coalesce_candidate()
instead of a register size and the virtual_grf_sizes array. Among other
things, this makes the code more obvious because you don't have to figure
out where src_size came from.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Right now, this function is a no-op but it indicates that we intend to only
use the first half of the 16-wide register.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit reworks copy propagation a bit to support propagating the
copying of partial registers. This comes up every time we have pull
constants because we do a pull constant read immediately followed by a move
to splat the one component of the out to 8 or 16-wide. This allows us to
eliminate the copy and simply use the one component of the register.
Shader DB results:
total instructions in shared programs: 5044937 -> 5044428 (-0.01%)
instructions in affected programs: 66112 -> 65603 (-0.77%)
GAINED: 0
LOST: 0
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A switch statement is much easier to read/edit than a big giant or
statement.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This splits emit_fb_writes into two functions: emit_fb_writes and
emit_single_fb_write. This reduces the amount of duplicated code in
emit_fb_writes and makes the register number fiddling less arcane.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Sometimes these show up in LOAD_PAYLOAD instructions and it's nice to be
able to see them.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we disabled compact_virtual_grfs when dumping optimizations.
The idea here was to make it easier to diff the dumped shader because you
didn't have a sudden renaming. However, sometimes a bug is affected by
compact_virtual_grfs and, when this happens, you want to keep dumping
instructions with compact_virtual_grfs enabled. By turning it into an
optimization pass and dumping it along with the others, we retain the
ability to diff because you can just diff against the compact_virtual_grf
output.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We also set the register width equal to the dispatch width. Right now,
this is effectively a no-op since we don't do anything with it. However,
it will be important once we add an actual width field to fs_reg.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if an instruction wrote to more than one register, we
implicitly assumed that it filled the entire register. We never hit this
before because the only time we did multi-register writes was things like
texturing which always wrote to all of the registers. However, with the
upcoming ability to do 16-wide instructions in SIMD8 and things of that
nature, we can have multi-register writes at offsets and we'll hit this.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using a floating-point type doesn't usually cause hangs on my HSW, but the
simulator complains about it quite a bit.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We have this wonderful offset() function for advancing registers, but we're
not using it. Using offset() allows us to do some sanity checking and
avoid manually touching fs_reg::reg_offset. In a few commits, we will make
offset do even more nifty things for us.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The original vgrf splitting code was written with the assumption that vgrfs
came in two types: those that can be split into single registers and those
that can't be split at all It was very conservative and bailed as soon as
more than one element of a register was read or written. This won't work
once we start allowing a regular MOV or ADD operation to operate on
multiple registers. This rewrite allows for the case where a vgrf of size
5 may appropriately be split in to one register of size 1 and two registers
of size 2.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Previously, we were generating the fast-clear shader from GLSL. The
problem is that fast clears require that we use a replicated write rather
than a regular write instruction. In order to get this we had a
complicated and somewhat fragile optimization pass that looked for places
where we can use a replicated write and used it. Since replicated writes
have a lot of restrictions, we only ever use them for fast-clear
operations.
This commit replaces the optimization pass with a function that just
generates the shader we want. This is a) less code, b) less fragile than
the optimization pass, and c) generates a more efficient shader.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes GPUVM faults when running the piglit test "getteximage-formats
init-by-rendering" with R600_DEBUG=forcedma on SI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We are currently only dealing with depth-only or stencil-only resources
here, not with resources having both depth and stencil[0]. In both cases,
the tiling mode index is in the tile_mode field, not in the
stencil_tile_mode field.
[0] Add an assertion for that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add finalize_vertex_elements() to finalize ilo_ve_state. This fixes a
potential issue with URB entry allocation for VS and move the complexity of
gen6_3DSTATE_VERTEX_ELEMENTS() to the new function.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
The code was kind of mixed up what buffers were getting stored in the case
that a resolve bit was unset (which are set based on the GL state at draw
time) and the buffer wasn't actually bound. In particular, depth-only
rendering would store the color buffer contents, which happen to be
pointing at the depth buffer.
Thanks to clearing out the resolve bits for things we really can't
resolve, now I can drop the safety checks for buffer presence around the
actual stores.
Fixes 42 piglit tests.
My attempts to clarify the code with _compacted/_uncompacted prefixed
variables apparently failed. Hopefully this is clearer.
In any case, the previous code wasn't clear enough to gcc to let it
optimize division by a power of two into a shift. No problems now.
Also, the previous code (in the ADD case) didn't work on 32-bit x86, due
to complicated set of interactions best summed up as unsigned division
and compiler optimizations.
Tested-by: Mark Janes <mark.a.janes@intel.com>
We need to keep track if a state change other than frag/vert shader
state will trigger us to need a different shader variant, and if
necessary mark the appropriate shader state as dirty. Otherwise we will
forget to re-emit the shader state.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is for hw that needs to emulate some texture wrap modes (like
CLAMP) with some help from the shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Keep the existing function as a common helper. But this lets us move an
a2xx specific hack out of common code. And the PIPE_TEX_WRAP_CLAMP
emulation will require an a3xx specific hack. So rather than piling on
hacks, split this out.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We just clamp the incoming texture coordinates. This breaks the lambda
calculation, but it gets the piglit tests to pass. This is the same
behavior as in i965.
One spot in the docs says that it's stored at a miplevel just beyond the
last miplevel, which was scary. But really, you just load it as the R
coordinate (which conflicts with cubemaps, but you don't do border
clamping on cubes).
We have to expose them for GL 2.0, but we just always return a value of 0.
We should be advertising 0 query bits instead of 64, but gallium doesn't
have plumbing for that yet. At least this stops the segfaults.
This may reduce register pressure and uniform counts. Drops a bunch of 0
uniform loads on vs-temp-array-mat4-index-col-row-wr.shader_test, which is
failing to register allocate.
It's not passing some of the piglit tests, because it looks like at small
miplevels some contents from surrounding faces are getting filtered in at
the corners. It does get 7 new tests passing.
In the other related fields, "0" refers to the size of the first miplevel,
while this is a field in a slice. The other implicit slices we have
(cubemap layers) don't vary in size compared to the first one.
Almost always, the MOV will get copy propagated out. Even if it doesn't,
it's probably better to just reload the uniform at next use (to reduce
register pressure) rather than try to save instruction count.
I was looking at this because in the presence of texturing (which calls
add_uniform() directly to get the uniform load forced into the
instruction) the c->uniform_contents indices don't match 1:1 with the
temporary qregs.
commit 4ed23fd broke creation of pbuffer surfaces, patch fixes
the failure, noticed when running chrome with '--use-gl=egl'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
When an instruction's result was consumed by multiple mov.sat
instructions, we would decide that we couldn't move the saturate
modifier because something else was using the result, even though it was
just another mov.sat!
total instructions in shared programs: 4275598 -> 4274842 (-0.02%)
instructions in affected programs: 75634 -> 74878 (-1.00%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When we find a mov.sat, we search backwards. We might as well search
everything else backwards as well and potentially look at fewer
instructions.
This change enables the next patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Get rid of the 'default' case (as suggestied by imirkin) so compiler
warns us about missing caps. Also add some caps that were missing until
now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
4155d1c7 'st/mesa: drop dependence on API profile in st_init_extensions'
broke freedreno because somehow 'PIPE_CAP_MAX_VIEWPORTS' fell through
the cracks. Resulting that we reported zero viewports. So the state
tracker never bothered to give us any valid viewport!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is the only guaranteed way get the patch level for llvm,
since the define cannot always be found in config.h depending
on the version of llvm or the build system used.
CC: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Gray <jsg@jsg.id.au>
Added back in 2009, with osmesa/GLU in mind. Unlikely to be working
any more since the removal of the static makefiles.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than having double negatives -> disable-opencl, default=no
simply use enabled/disabled. It makes things a bit easier for the
reader and consistent throughout the file.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The location of the egl driver(s) is matter that we should have
never exposed to the user. Currently the dri2 driver is built
into the libEGL loader, with the gallium based one soon to follow.
v2: Fold EGL_DRIVER_INSTALL_DIR within the makefiles. Suggested by Matt.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80615
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is not used outside the render code. There are also too many details in it
that we do not want other components to access directly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Remove emit_draw() and ILO_RENDER_DRAW indirections. With all emit functions
being direct now, ilo_render_estimate_size() and more can also be removed.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add these new high-level functions
ilo_render_get_draw_dynamic_states_len()
ilo_render_emit_draw_dynamic_states()
ilo_render_get_rectlist_dynamic_states_len()
ilo_render_emit_rectlist_dynamic_states()
ilo_render_get_draw_surface_states_len()
ilo_render_emit_draw_surface_states()
for draw and rectlist state emission. They are implemented in the new
ilo_render_dynamic.c and ilo_render_surface.c.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
For all supported query types, we always emit a PIPE_CONTROL. Call
ilo_render_get_flush_len() for simplicity and clarity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
ilo_render is based on ilo_builder. We should only care if the builder
buffers are invalidated, or if the hardware context is invalidated. Replace
ilo_render_invalidate() with flags by ilo_render_invalidate_builder() and
ilo_render_invalidate_hw().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Both dynamic buffer and surface buffer are state buffers. We should not use
state buffer to refer to the former.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move members of ilo_3d that still make sense to ilo_context. With ilo_3d
gone, rename functions whose names begin with ilo_3d to something more
appropriate.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
According to GLSL(4.2) and GLSL-ES (1.0, 3.0) spec, Structures must
have the same name to be considered same type. We currently ignore
the name check while checking if two records are same. This patch
fixes this.
Patch fixes failing tests in WebGL conformance test
'shaders-with-uniform-structs' when running Chrome on OpenGL ES.
v2: Do not force name comparison with unnamed types (Tapani)
v3: Cleanups (Matt)
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83934
The draw module would still try to use gallivm, causing many piglit tests
to fail with an assertion failure. llvmpipe might have been similarly
affected.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
mesa_texstore expects pixel data, not compressed data. For compressed
textures, we want to just copy the bits in without any conversion.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
What happens is that a SPLIT operation is part of the spill node, and as
a pseudo op, the instruction gets erased after processing its first def.
However the later defs still need to refer to it, so instead delay
deleting until after that whole RA node is done processing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79462
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
These are pretty catastrophic, "should never happen" failure paths (though
4 tests in piglit hit them currently, due to a single bug). An abort()
that you can gdb on easily is probably more useful than a clean exit,
particularly since a bug in piglit framework right now is causing early
exit(1)s to simply not be recorded in the results at all.
I only made IS_NEGATIVE(x) use signbit in commit 0f3ba405 in an attempt
to fix 54805, but it didn't help. We didn't use signbit on some
platforms and instead defined it to x < 0.0f.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Was only tracked to be printed at the end of configure, but configure
quits if it can't build something we requested, rather than silently
dropping it, so printing these directories has little use.
Note that I had to add support for testing the packed attribute to
m4/ax_gcc_func_attribute.m4.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Presumbly this will let clang and other compilers use the built-ins as
well.
Notice two changes specifically:
- in _mesa_next_pow_two_64(), always use __builtin_clzll and add a
static assertion that this is safe.
- in macros.h, remove the clang-specific definition since it should
be able to detect __builtin_unreachable in configure.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [C bits]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec says the type must be W (JIP is 16-bits after all), but we've
been emitting it with a UD type all along and have experienced no
adverse effects. Changing the type to D allows ELSE and ENDIF
instructions to be compacted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We're currently emitting compactable control flow instruction the wrong
types, preventing their compaction. The next patch will fix this and
actually enable compaction.
On chips that cannot compact control flow instructions, attempts to find
a match in the datatype table will fail.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The array was previously indexed in units of brw_compact_inst (8-bytes),
but before compaction all instructions are uncompacted, so every odd
element was unused.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Used to pass over previously compacted instructions in this loop, but no
longer. No point in checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It may be possible to create a contrived example in which a 3-src
instruction would have been compacted on Gen < 8. I'd rather not
discover it in the wild.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently a no-op, since instruction compaction isn't implemented for the
generations that have a programmable strips-and-fans unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Despite what the Sandybridge PRM says, ENDIF has Jump Count in <dst>,
not JIP in <src1>. (The same mistake appears about WHILE as well).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both sizes are VERT_ATTRIB_MAX, so this has no effect. But it drops a
few trivial uses of the derived state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not familiar with this code, but this sure appears to be a typo.
It looks like the intent is to set each array element, not arrays[0]
each time. Notably, the loop just below uses "array", not "arrays".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
The code in get.c that handles this uses ctx->Array.VAO->VertexAttrib,
which is a gl_vertex_attrib_array structure, not a gl_client_array.
The offsets of all fields happened to be the same in both structures, at
least on x86_64. "Size," "Type," and "Stride" are obviously the same:
both structures start with the same fields, in the same order.
"Enabled" is dicier: there are different fields before it in both
structures, including pointer sized values which might need special
alignment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: mesa-stable@lists.freedesktop.org
Dead since the _MaxElement removal, but these functions seemed generally
applicable, so I decided to remove them in a separate patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
max_index was coming from either the user telling us as part of
glDrawRangeElements, or from an incidental calculation as part of some
sort of primitive conversion fallback. Sometimes, it was just set to the
default "I don't know" ~0 value.
If it wasn't set to the actual max index, then the kernel would reject the
draw call for allowing out-of-bounds VBO reads. So, compute the max index
from the sizes of the VBOs, which isn't too expensive (unlike mapping and
reading the index buffer) and is reliable.
Fixes piglit vao-element-array-buffer.
Still some open questions.. and at any rate, no additional piglit passes
due to various wrap modes that we need to emulate in at least some
cases :-(
But it does fix some mystery page-faults.. So add some comments in the
code where there are things that we need to emulate or do more r/e, and
push as-is.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead of separate color and Z/S writemasks, just have one writemask
parameter that takes a mask of the PIPE_MASK_[RGBAZS] flags.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
PIPE_TEX_MIPFILTER_x is not legal for the pipe_sampler_state::
min/mag_img_filter fields. But PIPE_TEX_MIPFILTER_x == PIPE_TEX_FILTER_x
so we were getting lucky.
This also makes the code consistent with u_blitter.c.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
The register coalescing portion of this patch hurts three shaders in
Guacamelee by one instruction each, but examining the diff makes me
believe that what we were generating was (perhaps harmlessly) incorrect.
When the instructions aren't in a flat list, this wouldn't have worked.
Also, this should be faster.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The only trick is changing a break into a return true in register
coalescing, since the macro is actually a double loop, and break will do
something different than you expect. (Wish I'd realized that earlier!)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that nothing invalidates the CFG, we can calculate_cfg() immediately
after emit_fb_writes()/emit_thread_end() and never again.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This cleans up the debug flags to be consistently indented, use bit
shifting instead of hex-values and fixes a bug where the new DEBUG_NO8 flag
used the same value as the DEBUG_VUE flag. This was hidden by the numbers not
being aligned. Also removes gaps in the range where DEBUG_IOCTL (0x4) and
DEBUG_REGION (0x400) used to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
LLVM commit r218316 removes the JITMemoryManager class, which is
the parent for a seemingly important class in gallivm. In order to
fix the build, I've wrapped most of lp_bld_misc.cpp in
if HAVE_LLVM < 0x0306 and modifyed the
lp_build_create_jit_compiler_for_module() function to return false
for 3.6 and newer which effectively disables the gallivm functionality.
I realize this is overkill, but I could not come up with a simple
solution to fix the build. Also, since 3.6 will be the first release
without the old JIT, it would be really great if we could
move gallivm to use the C API only for accessing MCJIT. There
is still time before the 3.6 release to extend the C API in
case it is missing some functionality that is required by gallivm.
This fixes heap corruption. The sampler view can be bound in the context,
so we cannot call destroy directly.
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, pass the layout of GS inputs in memory to the ES using the shader
key. Only 64 bits are needed to represent the layout in the key.
Mixing and matching different VS and GS shaders should now always work.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I will need this for fixing sample shading with 1 sample.
The good news is that all shader pm4 states no longer use the current context
state, so we can generate the pm4 states outside of draw_vbo if needed.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Generic varyings in TGSI were based on the value of VARYING_SLOT_TEX0, so VAR0
was always GENERIC[22] (with tessellation patches). Some drivers might not
be able to cope with that.
This commit defines a proper mapping, so that PNTC is GENERIC[8] and VAR0 is
GENERIC[9].
Reviewed-by: Brian Paul <brianp@vmware.com>
The extensions and limits being set in the conditional block are core-only
anyway and don't have any effect on other profiles.
Reviewed-by: Brian Paul <brianp@vmware.com>
E.g. the 4.0 compatibility profile can be forced with:
MESA_GL_VERSION_OVERRIDE=4.0COMPAT
Some tests that I have require 4.0 compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The respective HAVE_{SOFT,LLVM}PIPE are already descriptive
enough. Additionally the svga modules does not really use either
one, but the auxiliary draw & gallivm modules.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Rather than duplicating the libdeps, extra define... all over the
targets, define them only once and use when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
The dri, vdpau, omx, xvmc and gbm targets don't need any authentication
even the VL ones never used it. Either the respective loader or the
library itself (vl) is doing its auth prior to calling create_screen()
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Make sure that MEGADRIVERS is set in order to create the hardlinks.
The variable name is not the most appropriate and will be sorted
out in upcoming commits.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
There are only 32 bits in the flatshade flags (which are 1 bit per
component), the simulator crashes when you use more than about this many
varyings, and the original Broadcom code drop only exposed 8 as well.
Fixes 26 piglit tests in the varying-packing group, and makes many others
go from crash to fail (due to not checking their varying counts and
treating link failures as failures). Regresses ARB_fp/minmax (due to 8
varyings instead of 10).
This is just the GL 1.1 flat shading of colors -- we don't need to support
TGSI constant interpolation bits, because we don't do GLSL 1.30.
Fixes 7 piglit tests.
They still provide register pressure since I haven't made a special class
for them, but since they're only live for one instruction it probably
doesn't matter.
This improves the readability of QPU assembly.
The A-file unpack is just like R4 unpack, except that if you don't do a
floating-point operation it won't do float conversion (so int16 gets
scaled up to int32).
This will let me more reliably allocate a-file registers, which are going
to be even more in demand when I start using a-file unpacks.
Also fixes a bug where the reservation of payload registers (FRAG_Z/W) was
off by one but just caused failure to register allocate at all if the
off-by-one was fixed.
The r300 gallium driver is using it outside of the Mesa tree, and I wanted
to do so for vc4 as well. Rather than make the multiple-definitions
problem even more complicated, just move it to more-shared code.
v2: Don't forget to delete the symlink in r300 (review by Matt).
Delete more r300-helper references (review by Emil)
Don't prefix util/ header inclusion with "util/" (review by Emil)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> (v1)
ffeb77c7b0 had a typo which turned all signed
integer divisions into unsigned ones. Oops.
This gets us back the 51 little piglits
(all from glsl built-in-functions, fs/vs/gs-op-div-int-ivec2 and similar).
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If _mesa_get_tex_image() return NULL there is already error
set in context. Other error pats free allocated texture.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Check if _mesa_meta_bind_rb_as_tex_image() did give the texture.
If no texture was given there is already either
GL_INVALID_VALUE or GL_OUT_OF_MEMORY error set in context.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Running fast clear glClear with SNB caused Valgrind to
complain about this.
v2: line 237 fixed glClear from leaking memory, other
strdups are also now changed to ralloc_strdups but I
don't know what effect those have. At least no changes in
my Piglit quick run.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add current_pipe_control_dw1 and deferred_pipe_control_dw1 to track what have
been done since lsat 3DPRIMITIVE and what need to be done before next
3DPRIMITIVE. Based on them, we can emit WAs more smartly.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It was used to set has_gen6_wa_pipe_control to false when the batch buffer
changed. When called from emit_flush() and others, it also unset
ILO_3D_PIPELINE_INVALIDATE_BATCH_BO so that the following emit_draw() will not
set has_gen6_wa_pipe_control to false again. It sounded error-prone and was
just ugly.
We should be able to achieve the same goal by reset has_gen6_wa_pipe_control
in ilo_3d_pipeline_invalidate(). With handle_invalid_batch_bo() gone, the
emit functions can also be inlined.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
We do not need to call it from GEN7 pipeline anymore since software
PIPE_QUERY_PRIMITIVES_EMITTED is gone.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Previously we would get a potentially computed post-swizzle coord based
on the texture target info, which would not include the bias/lod in the
last argument.
The second argument does not have to be adjacent, so adjusting the order
array did not make sense.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This will make life a lot easier as we add support for additional
instructions.
v2: shadow reference value is always .z or .w
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The only place the enum pipe_type was used is for the TGSI sampler
view return type. So make it a TGSI type. Note: it appears this
part of TGSI isn't used by anyone so it may be removed in the future.
v2: the new name is tgsi_return_type, not tgsi_type. This means we
can drop the previously posted tgsi_type -> tgsi_opcode_type patch.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Called when the user can insert new decls, instructions.
This could be used in a few places in the 'draw' module.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This helps:
1. Reduce the need to have fs_visitor::key's type be brw_wm_prog_key*
2. Align the code to allow brw_sampler_prog_key_data to be pulled out of other
prog_key types for different stages.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Conditional rendering is not limited to draw_vbo(). Move the support to
ilo_context, and replace ilo_3d_pass_render_condition() by
ilo_skip_rendering().
SOL_RESET happens before bo execution. It should not be observed by the
commands that are already in the bo.
Move the code out of the pipeline now that it submits.
And config query and DRM_CONF_SHARE_FD to both mega-driver and
traditional build configs, so that EGL_EXT_image_dma_buf_import
works.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We used different lists for different types of queries because we wanted to
update software queries quickly. Now that there is no software queries, we
are fine with a single list.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Read PIPE_QUERY_PRIMITIVES_GENERATED and PIPE_QUERY_PRIMITIVES_EMITTED from
hardware registers. Because all queries now have a bo, remove unnecessary
checks for q->bo.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Add support for PIPE_QUERY_PRIMITIVES_GENERATED and
PIPE_QUERY_PRIMITIVES_EMITTED in ilo_3d_pipeline_emit_query().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
It replaces
ilo_3d_pipeline_emit_write_timestamp(),
ilo_3d_pipeline_emit_write_depth_count(), and
ilo_3d_pipeline_emit_write_statistics().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Make it own()'s responsibility to make room for release() and itself. To be
able to do that, allow ilo_cp_submit() in own().
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Move pipe states in ilo_context to the new ilo_state_vector. The motivation
is that ilo_context consists of several loosely related things. When we need
an ilo_context somewhere, we usually need only one or two of the things in it.
This change makes ilo_state_vector one such thing.
An immediate result is that we no longer need ilo_context in 3D pipelines,
something we have planned for since early days.
Tested on my snb-gt2:
4 tests skip->pass in spec/EXT_texture_array
51 tests skip->pass in spec.glsl-3.30
4 tests skip->pass in spec/!OpenGL 3.3
No regressions; no skip->fail changes.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some users don't understand that these variables can break OpenGL.
The general is rule is that if an app supports MSAA, you mustn't use
GALLIUM_MSAA.
For example, if an app has an 8xMSAA FBO and GALLIUM_MSAA=4
is set, resolving the FBO to the back buffer will be rejected which will look
like this on all gallium drivers:
http://www.phoronix.com/scan.php?page=article&item=amd_radeonsi_msaa
The environment variables also have no effect on modern apps like TF2, but
there is still a performance hit due to wasted bandwidth and VRAM.
In a nutshell, it does more harm than good.
Cc: 10.2 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
These instructions only have vex encodings, thus they can't be used without
avx. (Technically, one can still use avx-128 if avx isn't available because
the environment doesn't store the ymm registers, however I don't think llvm
can.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Geometry shaders was the only thing we needed to enable GLSL 1.50 and
OpenGL 3.2 in gen6.
v2: Layered clears do not work properly in gen6 with OpenGL 3.2. Kenneth
and Jordan realized that for this to work we also need
GL_AMD_vertex_shader_layered (which requires OpenGL 3.2, so it could not be
enabled before this patch), so we agreed to enable this together with
OpenGL 3.2 in this patch.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
In gen6 we will use the geometry shader implementation from gen6_gs_visitor.cpp
and keep the implementation in brw_vec4_gs_visitor.cpp for gen7+. Notice that
gen6_gs_visitor inherits from brw_vec4_gs_visitor so it is not a completely
seprate implementation of geometry shaders.
Also, gen6 does not support multiple dispatch modes, its default operation mode
is equivalent to gen7's SINGLE mode, so select that in gen6 for consistency.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Uniforms declared as uniform blocks are stored in ubo surfaces and need to
be pulled from the geometry shader program so make sure we upload them first
and do the same for pull constants.
This fixes all piglit tests that use uniform blocks:
bin/shader_runner tests/spec/glsl-1.50/uniform_buffer/gs-*
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For gen6 geometry shaders we use the first BRW_MAX_SOL_BINDINGS entries of the
binding table for transform feedback surfaces. However, vec4_visitor will
setup the binding table so that textures use the same space in the binding
table. This is done when calling assign_common_binding_table_offsets(0) as
part if its run() method.
To fix this clash we add a virtual method to the vec4_visitor hierarchy to
assign the binding table offsets, so that we can change this behavior
specifically for gen6 geometry shaders by mapping textures right after the
first BRW_MAX_SOL_BINDINGS entries.
Also, when there is no user-provided geometry shader, we only need to upload
the binding table if we have transform feedback, however, in the case of a
user-provided geometry shader, we can't only look into transform feedback
to make that decision.
This fixes multiple piglit tests for textureSize() and texelFetch() when these
functions are called from a geometry shader in gen6, like these:
bin/textureSize gs sampler2D -fbo -auto
bin/texelFetch gs usampler2D -fbo -auto
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Currently we buffer transform feedack varyings separately. This patch makes
it so that we reuse the values we have already buffered for all the output
varyings of the geometry shader instead.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since geometry shaders can alter the value of varyings packed in the first
output VUE slot (PSIZ), we need to buffer it together with all the other
vertex data so we can emit the right value for each vertex when we do the
URB writes.
This fixes the following piglit test in gen6:
tests/spec/glsl-1.50/execution/redeclare-pervertex-out-subset-gs.shader_test
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Update gen6_gs_binding_table and gen6_sol_surface to use user-provided
geometry program information when present. This is necessary to implement
transform feedback support.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This takes care of generating code required to handle transform feedback.
Notice that transform feedback isn't enabled yet, since that requires
additional setups in other parts of the code that will come in later patches.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We will use this parameter in later patches to provide information relevant
to transform feedback that needs to be set as part of the FF_SYNC message.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode generates code to copy the specified destination index
into subregister 5 of the MRF message header.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This opcode will be used when sending SVB WRITE messages to save
transform feedback outputs into Streamed Vertex Buffers.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
So far in gen6 we only used geometry shaders to implement transform feedback
in vertex shaders, so we assumed that the VUE map for the geometry shader
stage was always the same as for the vertex shader stage. This is no longer
true now that we support user provided geometry shaders in gen6 too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For this we will need to move PrimitiveID information, delivered in the thread
payload in r0.1, to a separate register (we use GS_OPCODE_SET_PRIMITIVE_ID
for this), then map the corresponding varying slot to that register in the
setup_payload() method.
Notice that we cannot use a virtual register as the destination for the
PrimitiveID because we need to map all input attributes to hardware registers
in setup_payload(), which happens before virtual registers are mapped to
hardware registers. We could work around that issue if we were able to compute
the first non-payload register in emit_prolog() and move the PrimitiveID
information to that register, but we can't because at that point we still
don't know the final number uniforms that will be included in the payload.
So, what we do is to place PrimitiveID information in r1, which is always
delivered as part of the payload but its only populated with data
relevant for transform feedback when we set GEN6_GS_SVBI_PAYLOAD_ENABLE
in the 3DSTATE_GS state packet.
When we implement transform feedback, we wil make sure to move the value of r1
to another register before we overwrite it with the PrimitiveID.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 the geometry shader payload includes the PrimitiveID information in
r0.1. When the shader code uses glPimitiveIdIn we will have to move this to
a separate hardware register where we can map this attribute. This opcode
takes the selected destination register and moves r0.1 there.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen6 we need to end the thread differently depending on whether we have
emitted at least one vertex or not. In case we did, the EOT message must
always include the COMPLETE flag or else the GPU hangs. If we have not
produced any output, however, we can't use the COMPLETE flag.
This would lead us to end the program with an ENDIF opcode, which we want
to avoid (and actually is not permitted since it hits an assertion), so
instead what we do is that we always request a new VUE handle every time we do
an URB WRITE, even for the last vertex we emit. With this we make sure that
whether we have emitted at least one vertex or none at all we have to finish the
thread without writing to the URB, which works for both cases by setting the
COMPLETE and UNUSED flags in the EOT message.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just in case the GS algorithm does not call EndPrimitive() for the last
primitive produced. This is relevant only for non point outputs, since for
this we are already setting the PrimEnd flag on each vertex we emit.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Geometry shaders in gen6 are significantly different from gen7+ so it is better
to have them implemented in a different file rather than adding gen6 branching
paths all over brw_vec4_gs_visitor.cpp.
This commit adds an initial implementation that only handles point output, which
is the simplest case.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In gen7+ we emit vertices as they come, however in gen6 geometry shaders we
have to buffer vertex data for all vertices and then emit it all in one go
at the end. To achieve this we need to generalize emit_urb_slot() to store
vertex data in general purpose registers and not only MRF registers.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We had GS_OPCODE_SET_DWORD_2_IMMED but this required its source argument to be
an immediate. In gen6 we need to set dword 2 of the URB write message header
from values stored in separate register, so we need something more flexible.
This change replaces GS_OPCODE_SET_DWORD_2_IMMED with GS_OPCODE_SET_DWORD_2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 seems to require that EOT messages include the complete flag too or else
the GPU hangs. We add will this flag to the instruction when we emit the
thread end opcode.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Gen6 geometry shaders need to allocate URB handles for each new vertex they
emit after the first (the URB handle for the first vertex is obtained via the
FF_SYNC message).
This opcode adds the URB allocation mechanism to regular URB writes.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This implements the FF_SYNC message required in gen6 geometry shaders to
get the initial URB handle.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is needed to support user-provided geometry shaders, since the
brw_ff_gs_prog atom in gen6 only takes care of implementing transform feedback
for vertex shaders.
If there is no user-provided geometry shader the implementation falls back to
the original code.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, gen6 only uses geometry shaders for transform feedback so the state
we emit is not suitable to accomodate general purpose, user-provided geometry
shaders. This patch paves the way to add these support and the needed
3DSTATE_GS packet modifications for it.
Previous code that emitted state to implement transform feedback in gen6 goes
to upload_gs_state_adhoc_tf().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, when a geometry shader can't use dual object mode we fall back to
dual instance mode, however, when invocations == 1, single dispatch mode is
more performant and equally efficient in terms of register pressure.
Single dispatch mode requires that the driver can handle interleaving of
input registers, but this is already supported (dual instance mode has
the same requirement). However, to take full advantage of single dispatch mode
to reduce register pressure we would also need the ability to store two
separate vec4 output values into vec8 registers, which would approximately
double our capacity to store temporary values, but currently the vec4 visitor
and generator classes do not support this, so at the moment register pressure
in single and dual instance modes is the same.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It has been a bad name since we added the builder. Rename it to
ILO_DEBUG=batch to match i965, and call ilo_builder_decode() from
ilo_cp_submit_internal().
"Flush" is used for too many things already: pipe resource flush, pipe context
flush, pipe transfer region flush, and hardware pipeline flush. Rename it to
ilo_cp_submit(). As such, ILO_DEBUG=flush is renamed to ILO_DEBUG=submit.
Fredrik's implementation of ARB_vertex_attrib_binding introduced new
gl_vertex_attrib_array and gl_vertex_buffer_binding structures, and
converted Mesa's older gl_client_array to be derived state. Ultimately,
we'd like to drop gl_client_array and use those structures directly.
One hitch is that gl_client_array::_MaxElement doesn't correspond to
either structure (unlike every other field), so we'd have to figure out
where to store it. The _MaxElement computation uses values from both
structures, so it doesn't really belong in either place. We could put
it in the VAO, but we'd have to pass it around everywhere.
It turns out that it's only used when ctx->Const.CheckArrayBounds is
set, which is only set by the (rarely used) classic swrast driver.
It appears that drivers/x11 used to set it as well, which was intended
to avoid segmentation faults on out-of-bounds memory access in the X
server (probably for indirect GLX clients). However, ajax deleted that
code in 2010 (commit 1ccef926be).
The bounds checking apparently doesn't actually work, either. Non-VBO
attributes arbitrarily set _MaxElement to 2 * 1000 * 1000 * 1000.
vbo_save_draw and vbo_exec_draw remark /* ??? */ when setting it, and
the i965 code contains a comment noting that _MaxElement is often bogus.
Given that the code is complex, rarely used, and dubiously functional,
it doesn't seem worth maintaining going forward. This patch drops it.
This will probably mean the classic swrast driver may begin crashing on
out of bounds vertex buffer access in some cases, but I believe that is
allowed by OpenGL (and probably happened for non-VBO accesses anyway).
There do not appear to be any Piglit regressions, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Roland Scheidegger <sroland@vmware.com>
While depth test state is passed through the fragment shader as sideband,
data, the stencil test state has to be set by the fragment shader itself.
Many tests are still failing, but this gets most of hiz/ passing.
The SWZ instruction can have swizzle terms >4 (SWIZZLE_ZERO, SWIZZLE_ONE).
These swizzle terms caused a few assertions to fail.
This started happening after the commit "mesa: Actually use the Mesa IR
optimizer for ARB programs." when replaying some apitrace files.
A new piglit test (tests/asmparsertest/shaders/ARBfp1.0/swz-08.txt)
exercises this.
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This allows for introducing dead code eliminating of uniforms, copy
propagation of uniforms, and instruction rescheduling between instructions
that both read uniforms.
This is particularly important for outputs, where we try to MOV the whole
vec4 to the VPM, even if only 1-3 components had been set up. It might
also be important for temporaries, if the shader reads components before
writing them.
While the result of signed integer division by zero is undefined by glsl
(and doesn't exist with d3d10), we must not crash, so need to make sure we
don't get sigfpe much like udiv already does.
Unlike udiv where we return 0xffffffff (as required by d3d10) there is
no requirement right now to return anything specific so we use zero.
sample opcodes don't have valid texture target information (and I don't think
this should be changed), however it would be nice if we had that information
ready elsewhere, so stuff that information into the tgsi info when analyzing
a shader.
v2: Ilja Mirkin spotted some bugs wrt not handling msaa resources. So add them
and while there also add them to the tex opcode analysis this was cloned from
as well (plus get rid of some bug not detecting indirect textures there in some
cases too).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
MESA_FORMAT_x8y8z8w8 puts the x channel in the least significant part of
the containing 32-bit integer, which is equivalent to PIPE_FORMAT_xyzw8888.
PIPE_FORMAT_x8y8z8w8 puts the x channel first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_LnAn puts the luminance in the least significant part of
the containing integer, which is equivalent to PIPE_FORMAT_LAnn.
PIPE_FORMAT_LnAn puts the luminance first in memory.
This patch fixes up the mesa<->gallium mapping accordingly.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each 8888 SRGB format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
v2: fix missing i965 additions. (Jason)
fix 127->255 max alpha for SRGB formats. (Jason)
v1: Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The associated UNORM format already existed.
This means that each LnAn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
[airlied: rebased onto current master]
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the first listed component is in the least
significant byte of the integer. The corresponding UNORM aliases already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means that each RnGnBnxn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
The associated UNORM and SRGB formats already exist.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
...i.e. formats in which the alpha or green channel is first in memory.
This means that each LnAn and RnGn format has a reversed counterpart,
which is necessary for handling big-endian mesa<->gallium mappings.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason pointed out the bug on review adding new formats,
but the existing format also appears to have the bug, so
use 255 as the max, these are SRGB no SNORM.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Luminance is the least-significant byte of the uint16, rather than the
lowest byte in memory. Other parts of mesa already handle this correctly
for big-endian, and swrast already handles other MESA_FORMAT_x8y8 formats
correctly. This case was just an odd-one-out.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MESA_FORMAT_R8G8B8X8_SNORM used a function called unpack_X8B8G8R8_SNORM
while MESA_FORMAT_R8G8B8X8_SRGB used a function called unpack_R8G8B8X8_SRGB.
This patch renames the SNORM function to have the same order as the
MESA_FORMAT name, like the SRGB function does.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was being shared using a ../../ get out of gallium into
mesa, and I swore when I did it I'd fix things when we got a util
dir, we did, so I have.
v2: move RGTC_DEBUG define
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This gets a ton of piglit working that crashes in waffle context
management stuff otherwise. Actually supporting mismatched FB sizes is at
best going to require some more load/store generals for color buffers, but
if I can't manage to do that I'll want to just have state_tracker reject
those FBOs as unsupported, rather than deny GL 2.1.
Previously, we would get a trailing ', ' which looked strange.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal here is to have an argument for the depth write opcode so that I
can do computed depth. In the process, this makes the calculations that
will be emitted more obvious in the QIR.
Commit afe3d1556f (i965: Stop doing
remapping of "special" regs.) stopped remapping delta_x/delta_y, and
additionally stopped considering them always-live. We later realized
delta_x was used in register allocaiton, so we actually needed to remap
it, which was fixed in commit 23d782067a
(i965/fs: Keep track of the register that hold delta_x/delta_y.).
However, that commit didn't restore the "always consider it live" part.
If all the code using delta_x was eliminated, fs_visitor::delta_x would
be left pointing at its old register number. Later code in register
allocation would handle that register number specially...even though it
wasn't actually delta_x.
To combat this, set delta_x/y to BAD_FILE if they're eliminated, and
check for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83127
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
The triangle_32_ rast functions never made it into the debug output,
confused me for a few seconds.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch builds on 6c8f547f66 and
previous patches by allowing u_format.csv to specify separate big-endian
and little-endian layouts. It then uses this to specify the correct layouts
for various depth/stencil formats. Later patches handle other formats.
To recap, the idea is that u_format.csv lists the channels for an N-byte
value as though it were an N-byte integer. For little-endian targets
the channels are listed starting at the least-significant bit of the
integer while for big-endian targets the channels are listed starting
at the most-significant bit. This means that for something like
PIPE_FORMAT_B8G8R8A8_UNORM (blue in first byte of memory, alpha in last
byte of memory) the orders are the same for both endiannesses. But for
something like PIPE_FORMAT_S8_UINT_Z24_UNORM, where the stencil is in
the least significant byte of a 32-bit integer, there need to be separate
channel definitions for each endianness.
The effect of this patch is to make the affected PIPE_FORMAT_*s have
the same layout as the associated MESA_FORMAT_*s for big-endian.
The MESA_FORMAT_*s are already handled correctly.
Fixes various piglit tests on z. No regressions on x86_64.
[airlied: squash subsequent patches]
util: Add big-endian layout for 5551 and 565 formats
util: Add big-endian layout for 10/10/10/2 formats
util: Add big-endian layout for 4444 formats
util: Add big-endian layout for 233 format
util: Add big-endian layout for 44 formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
llvmpipe treats PIPE_FORMAT_Z32_FLOAT_S8X24_UINT as a bit of a special case,
handling it as two 32-bit pieces rather than a single 64-bit block:
/* 64bit d/s format is special already extracted 32 bits */
total_bits = format_desc->block.bits > 32 ? 32 : format_desc->block.bits;
The format_desc describes the whole 64-bit block, so the z shift
will be 32 for big-endian. But since we're accessing the z channel
as a 32-bit value rather than a 64-bit value, we need to mask the shift
with 31.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just a very limited version, in particular sampler and sampler view
index must be the same. It cannot handle any modifiers neither.
Works much the same as soa version otherwise, to figure out the target we
need to store the sampler view dcls.
While here, also handle (no-op) RET and get rid of a couple bogus deprecated
comments.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes are a little oddly represented in the opcode_info, since
they don't count as texture instructions - they don't have valid target
information, but they may have offsets (unlike "ordinary" texture
instructions, the texture token may be optional for them).
So just make sure with these opcodes the optional offsets are accepted.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
sample opcodes don't encode a texture target, it would thus always
print UNKNOWN, which is not helpful (and wouldn't parse when giving
back the shader text to tgsi).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We don't have any specific limits in the hardware, just like the other
GPUs, so match their behavior. Fixes minmax_gles2 and several other
piglit tests relying on the specced uniform minmax values.
Put macro code in do {} while loop and put semicolons on macro calls
so auto indentation works properly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reduces gcc -O3 compile time to 1/4 of what it was on my system.
Reduces MSVC release build time too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
With GLES we don't give any kind of warning in case we don't
write to gl_position. This patch makes changes so that we
generate a warning in case of GLES (VER < 300) and an error
in case of GL.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Remap table for uniforms may contain empty entries when using explicit
uniform locations. If no active/inactive variable exists with given
location, remap table contains NULL.
v2: move remap table bounds check before existence check (Ian Romanick)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Erik Faye-Lund <kusmabite@gmail.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83574
We might run into ve->count == 0 and last_velement_edgeflag == true in
gen6_3DSTATE_VERTEX_ELEMENTS() when the state tracker sets an invalid
combination of VS and VE (does not seem to happen with st/mesa). Do not
assume ve->count is positive when last_velement_edgeflag is true.
Reported by Coverity.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Rename ilo_builder_batch_state_base_address() to gen6_state_base_address() for
consistency and remove unused gen6_STATE_BASE_ADDRESS(). Reorder the code in
gen6_PIPE_CONTROL() a bit. Finally, some mostly cosmetic changes.
Move functions for the 3D pipeline to the new headers. We artificially split
the functions into top (vertex processing) and bottom (pixel processing), to
keep the headers at reasonable sizes.
ir_rvalue::constant_expression_value() recursively walks down an IR
tree, attempting to reduce it to a single constant value. This is
useful when you want to know whether a variable has a constant
expression value at all, and if so, what it is.
The constant folding optimization pass attempts to replace rvalues with
their constant expression value from the bottom up. That way, we can
optimize subexpressions, and ideally stop as soon as we find a
non-constant subexpression.
In order to obtain the actual value of an expression, the optimization
pass calls constant_expression_value(). But it should only do so if it
knows the value can be combined into a constant. Otherwise, at each
step of walking back up the tree, it will walk down the tree again, only
to discover what it already knew: it isn't constant.
We properly avoided this call for ir_expression nodes, but not for
ir_swizzle nodes. This patch fixes that, drastically reducing compile
times on certain shaders where tree grafting has given us huge
expression trees. It also fixes SuperTuxKart.
Thanks to Iago and Mike for help in tracking this down.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The FS backend has always used 0, and the VS backend has always used 1.
I think 1 is just working around other problems, and is incorrect.
Samplers are baked in; nothing uses the UNIFORM register we would
create, and we shouldn't upload any constant values for them.
Fixes ES3-CTS.shaders.struct.uniform.sampler_array_vertex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Samplers take up zero slots and therefore don't exist in the params
array, nor are they included in stage_prog_data->nr_params. There's no
need to store their size in param_size, as it's only used for dealing
with arrays of "real" uniforms (ones uploaded as shader constants).
We run into all kinds of problems trying to refer to the uniform storage
for variables that don't have uniform storage. For one, we may use some
other variable's index, or access out of bounds in arrays. In the FS
backend, our extra 2 * MaxSamplerImageUnits params for texture rectangle
rescaling paper over a lot of problems. In the VS backend, we claim
samplers take up a slot, which also papers over problems.
Instead, just skip allocating storage for variables that don't have any.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
We always uploaded them together, mostly out of laziness - both required
an additional vertex element. However, gl_VertexID now also requires an
additional vertex buffer for storing gl_BaseVertex; for non-indirect
draws this also means uploading (a small amount of) data. This is extra
overhead we don't need if the shader only uses gl_InstanceID.
In particular, our clear shaders currently use gl_InstanceID for doing
layered clears, but don't need gl_VertexID.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
In the non-indirect draw case, we call intel_upload_data to upload
gl_BaseVertex. It makes brw->draw.draw_params_bo point to the upload
buffer, and increments the upload BO reference count.
So, we need to unreference it when making brw->draw.draw_params_bo point
at something else, or else we'll retain a reference to stale upload
buffers and hold on to them forever.
This also means that the indirect case should increment the reference
count on the indirect draw buffer when making brw->draw.draw_params_bo
point at it. That way, both paths increment the reference count, so
we can safely unreference it every time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
4f338c9b introduced logic to trigger a flush rather than overflowing
cmdstream buffer. But the threshold was too low, triggering flushes
where they were not needed. This caused problems with games like
xonotic.
Part of the problem is that we need to mark all state dirty between
cmdstream submit ioctls, because we cannot rely on state being
preserved across ioctls. But even with that, there are still some
problems that are still being debugged. For now:
1) correctly mark all state dirty
2) introduce FD_MESA_DEBUG flush flag to force rendering to be flushed
between each draw, to trigger problems (so that I can debug)
3) use a more reasonable threshold so for normal usecases we don't
trigger the problems
This at least corrects the regression, but there is still more debugging
to do.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need the .w component to end up in .x, since the hw appears to fetch
gl_FragColor starting with the .x coordinate regardless of MRT format.
As long as we are doing this, we might as well throw out the remaining
unneeded components.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Because of render-to-alpha (000x) shenanigans, freedreno needs to do
some special handling when rendering to alpha-only formats. And I
noticed that while we had _is_luminance(), _is_intensity(), etc, an
_is_alpha() helper was missing. So fix that.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Unreference the ctx->_Shader object before we delete all the pipeline
objects in the hash table. Before, ctx->_Shader could point to freed
memory when _mesa_reference_pipeline_object(ctx, &ctx->_Shader, NULL)
was called.
Fixes crash when exiting the piglit rendezvous_by_location test on
Windows.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q_total should never go below 0 (which is why it's defined as unsigned),
and if it does, then something is seriously wrong.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As noted in the previous commit, this was introduced in
567e2769b8 ("ra: make the p, q test more
efficient"), but I forgot to mention it.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 567e2769b8 ("ra: make the p, q
test more efficient") I unknowingly introduced a new requirement to the
register allocator API: the user must set the register class of all
nodes before setting up their interferences, because
ra_add_conflict_list() now uses the classes of the two interfering
nodes. i965 already did this, but r300g was setting up register classes
interleaved with setting up the interference graph. This led to us
calculating the wrong q total, and in certain cases
e78a01d5e6 (" ra: optimistically color
only one node at a time") made it so that this bug caused a segfault. In
particular, the error occurred if the q total was decremented to 1 below
0 for the last node to be pushed onto the stack. Since q_total is an
unsigned integer, it overflowed to 0xffffffff, which is what
lowest_q_total happens to be initialzed to. This means that we would
fail the "new_q_total < lowest_q_total" check on line 476 of
register_allocate.c, and so the node would never be pushed onto the
stack, which led to segfaults in ra_select() when we failed to ever give
it a register.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82828
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Pavel Ondračka <pavel.ondracka@email.cz>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Needed for assert.
Fixes build on BE archs with -Werror=implicit-function-declaration.
In file included from
../../../../../src/gallium/auxiliary/draw/draw_fs.c:30:0:
../../../../../src/gallium/auxiliary/util/u_math.h: In function
'util_memcpy_cpu_to_le32':
../../../../../src/gallium/auxiliary/util/u_math.h:810:4: error:
implicit declaration of function 'assert'
[-Werror=implicit-function-declaration]
assert(n % 4 == 0);
^
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In buf_clear_region() and buf_copy_region(), max_cmd_size was set to 0. If
either of the functions is called and there is not enough space in the
builder, the next ilo_cp_flush() will fail silently in a release build.
Replace magic numbers by size defines in tex_clear_region()/tex_copy_region()
for consistency and readability.
Intruduce gen6_blt_bo and gen6_blt_xy_bo to describe BOs. In the extreme case
of gen6_XY_SRC_COPY_BLT(), the number of parameters goes down from 18 to 8.
This allows a sampler view to have a different texture target than the
underlying resource. This will be used to implement the type casting
between 2d arrays and cube maps as specified in ARB_texture_view.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a regression from "nouveau/vdec: small fixes to h264 handling"
New picking order for frames:
1. Vidbuf pointer matches.
2. Take the first kicked ref.
3. If that fails, take a ref that has a different last_used.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
The BSP bo might be too small to contain all of the bsp data,
bump its size on overflow. Also bump inter_bo when this happens,
it might be too small otherwise.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
If the source is not a GRF, it could have a register >= virtual_grf_count.
Accessing virtual_grf_end with such a register would lead to
out-of-bounds access. Make sure the source is a GRF before accessing
virtual_grf_end.
Fixes Valgrind complaints while compiling some shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
If the glx/wgl state tracker requested a core profile but the gallium
driver did not support some feature of GL 3.1 or later, we were setting
ctx->Version=0 and then failing the assertion in
_mesa_initialize_exec_table().
With this change we check for ctx->Version=0 and tear down the context
and return NULL from st_create_context().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage.
From ARB_transform_feedback3:
"When a generated primitive query for a vertex stream is active, the
primitives-generated count is incremented every time a primitive emitted to
that stream reaches the Discarding Rasterization stage (see Section 3.x)
right before rasterization. This counter is incremented whether or not
transform feedback is active."
Unfortunately, we don't have any registers that provide the number of primitives
written to a specific stream other than the ones that track the number of
primitives written to transform feedback in the SOL stage, so we can't
implement this exactly as specified.
In the past we implemented this feature by activating the SOL unit even if
transform feeback was disabled, but making it so that all buffers were
disabled and it only recorded statistics, which gave us the right semantics
(see 3178d2474a). Unfortunately, this came with
a significant performance impact and had to be reverted.
This new take does not intend to implement the exact semantics required by
the spec, but improves what we have now, since now we return the primitive
count for stream 0 in all cases. With this patch we use
GEN7_SO_PRIM_STORAGE_NEEDED to resolve GL_PRIMITIVES_GENERATED queries
for non-zero streams. This would return the number of primitives written
to transform feedback for each stream instead. Since non-zero streams are
only useful in combination with transform feedback this should not be too
bad, and the only case that I think we would not be supporting would be
the one in which we want to use both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN on the same non-zero stream to
detect buffer overflow.
This patch also fixes the following piglit test:
arb_gpu_shader5-xfb-streams-without-invocations
This test uses both GL_PRIMITIVES_GENERATED and
GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries on non-zero streams, but it
does never hit the overflow case, so both queries are always expected to return
the same value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
That better matches the actual userspace use case, the
kernel will force it to VRAM if the hardware requires it.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Less CPU overhead and avoids contention over CPU accessible memory on startup.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
In preparation to using buffers clears with the hw engine(s).
v2: split out flipping to using hw buffer clears.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This builds the opengl DLLs with address layout space randomization
(ASLR) and data execution prevention (DEP) for better security.
Reviewed-by: Kurt Daverman <krd@vmware.com>
The old disassembler was modified from i965's. It is as much work as doing a
new one to keep it up-to-date, which also requires copying more headers over.
The outputs of this new disassembler should match i965's as closely as
possible.
Add some new registers and some tweaks. The changes that affect ilo are
GEN6_REG_HS_INVOCATION_COUNT -> GEN7_REG_HS_INVOCATION_COUNT
GEN6_REG_DS_INVOCATION_COUNT -> GEN7_REG_DS_INVOCATION_COUNT
GEN6_COND_NORMAL -> GEN6_COND_NONE
If a precision qualifer is allowed on type T, it should be allowed
on an array of T. Refactor the check to ensure this is the case.
(Fixes failures in WebGL conformance test 'gl-min-textures')
Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
MSVC replaces the "F" in "255.0F" with the macro argument which leads
to an error. s/F/FLT/ to avoid that.
It turns out we weren't using this macro at all on MSVC until the
recent "mesa: Drop USE_IEEE define." change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This patch fixes a build error on DragonFly.
CC libpipe_loader_la-pipe_loader_drm.lo
pipe_loader_drm.c: In function 'pipe_loader_drm_probe':
pipe_loader_drm.c:207:10: error: implicit declaration of function 'close' [-Werror=implicit-function-declaration]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Apparently guardband clipping doesn't work like we thought: objects
entirely outside fthe guardband are trivially rejected, regardless of
their relation to the viewport. Normally, the guardband is larger than
the viewport, so this is not a problem. However, when the viewport is
larger than the guardband, this means that we would discard primitives
which were wholly outside of the guardband, but still visible.
We always program the guardband to 8K x 8K to enforce the restriction
that the screenspace bounding box of a single triangle must be no more
than 8K x 8K. So, if the viewport is larger than that, we need to
disable guardband clipping.
Fixes ES3 conformance tests:
- framebuffer_blit_functionality_negative_height_blit
- framebuffer_blit_functionality_negative_width_blit
- framebuffer_blit_functionality_negative_dimensions_blit
- framebuffer_blit_functionality_magnifying_blit
- framebuffer_blit_functionality_multisampled_to_singlesampled_blit
v2: Mention the acronym expansion for TA/TR/MC in the comments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that we have the data available, we need to expose it to the
shaders. We can reuse the same vertex element that we use for
gl_VertexID, but we need to back it by an actual vertex buffer.
A hardware restriction requires that vertex attributes coming from a
buffer (STORE_SRC) must come before any other types (i.e. STORE_0).
So, we have to make gl_BaseVertex be the .x component of the vertex
attribute. This means moving gl_VertexID to a different component.
I chose to move gl_VertexID and gl_InstanceID to the .z and .w
components, respectively, to make room for gl_BaseInstance in the .y
component (which would also come from a buffer, and therefore be
STORE_SRC).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll need to emit another VERTEX_BUFFER_STATE for gl_BaseVertex;
pulling this into a helper function will save us from having to deal
with cross-generation differences in that code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used for GL_ARB_shader_draw_parameters, as well as fixing
gl_VertexID, which is supposed to include gl_BaseVertex's value.
For indirect draws, we simply point at the indirect buffer; for normal
draws, we upload the value via the upload buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The lower_vertex_id pass converts uses of the gl_VertexID system value
to the gl_BaseVertex and gl_VertexIDMESA system values. Since
gl_VertexID is no longer accessed, it would not be considered active.
Of course, it should be, since the shader uses gl_VertexID.
v2: Move the var->name dereference past the var != NULL check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Converts gl_VertexID to (gl_VertexIDMESA + gl_BaseVertex). gl_VertexIDMESA
is backed by SYSTEM_VALUE_VERTEX_ID_ZERO_BASE, and gl_BaseVertex is backed
by SYSTEM_VALUE_BASE_VERTEX.
v2: Put the enum in struct gl_constants and propoerly resolve the scope
in C++ code. Fix suggested by Marek.
v3: Reabase on Matt's foreach_in_list changes (was using foreach_list).
v4 (Ken): Use a systemvalue instead of a uniform because
STATE_BASE_VERTEX has been removed.
v5: Use a boolean to select lowering, and only allow one lowering
method. Suggested by Ken.
v6 (Ken): Replace strcmp against literal "gl_BaseVertex"/"gl_VertexID"
with SYSTEM_VALUE enum checks, for efficiency.
v7: Rebase on context constant initialization work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
There exists hardware, such as i965, that does not implement the OpenGL
semantic for gl_VertexID. Instead, that hardware does not include the
value of basevertex in the gl_VertexID value.
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE is the system value that represents
this semantic.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
v2: Additions to the documentation for SYSTEM_VALUE_VERTEX_ID. Quote
the GL_ARB_shader_draw_parameters spec and mention DirectX SV_VertexID.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
181581280b changed the way the
llvm-config version is read from sed to grep and introduced
a requirement for gnu grep extension that treats BREs as EREs.
Avoid this by calling egrep instead of grep which should be
able to handle EREs everywhere.
This allows Mesa to build on OpenBSD again.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
This doesn't quite make depth-tex-compare work, presumably because we're
not hitting equality with itof(sample) * 1.0/0xffffff in the 0xffffff
case. arb_fragment_program_shadow tests pass, though, as well as a bunch
of other shadow-related stuff.
We potentially need to be careful that use of a value stored in r4 isn't
copy-propagated (or something) across another r4 write. That doesn't
appear to happen currently, and this makes the dataflow more obvious. It
also opens up not unpacking the r4 value, which will be useful for depth
textures.
Since software primitive-restart emulation is going to be removed (and
anyways, mostly seemed to be crash prone in combination with
u_primconvert and oddball scenarios (like PIPE_PRIM_POLYGON with only a
single vertex), might as well do it in hardware (which fortunately
didn't turn out to be too hard to figure out).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Triggered by shaders like:
FRAG
PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
DCL OUT[0], COLOR
DCL CONST[0]
DCL TEMP[0..2], LOCAL
0: IF CONST[0].xxxx :0
1: MOV TEMP[0], TEMP[1]
2: ELSE :0
3: MOV TEMP[0], TEMP[2]
4: ENDIF
5: MOV OUT[0], TEMP[0]
6: END
not really a sane shader, although driver segfaulting is probably
not the appropriate response.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We currently aren't too clever about dealing with running out of
cmdstream buffer space. Since we use a single buffer for both drawing
and tiling commands, we need to ensure there is enough space at the tail
of the cmdstream buffer to fit the tiling commands.
Until we get more clever, the easy solution is a threshold to trigger
flushing rendering even if the application does not trigger flush (swap,
changing render target, etc). This way we at least don't crash for apps
that do several thousand draw calls (like some piglit tests do).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Most of the things the new compiler still has trouble with basically
amount to cp stage removing too many copies. But without the cp stage,
the shaders the new compiler produces are still better (perf and
correctness) than the old compiler. So a simple thing to do until I
have more time to work on it is first trying falling back to new
compiler without cp, before finally falling back to old compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Excess conversions considered harmful.
Recently Matt reworked the boolean uniform handling to use the value of
UniformBooleanTrue, rather than integer 1, when uploading uniforms:
mesa: Upload boolean uniforms using UniformBooleanTrue.
glsl: Use UniformBooleanTrue value for uniform initializers.
Marek then set the default to 1.0f for drivers without native integer
support:
mesa: set UniformBooleanTrue = 1.0f by default
However, ir_to_mesa was assuming a value of integer 1, and arranging for
it to be converted to 1.0f on upload. Since Marek's commit, we were
uploading 1.0f = 0x3f800000 which was being interpreted as the integer
value 1065353216 and converted to float as 1.06535322E9, which broke
assumptions in ir_to_mesa that "true" was exactly 1.0f.
+13 Piglits on classic swrast (fs-bool-less-compare-true,
{vs,fs}-op-not-bool-using-if, glsl-1.20/execution/uniform-initializer).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83573
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa already defines _GNU_SOURCE for glibc based systems and defining
_GNU_SOURCE will break the Mesa build on other systems such as OpenBSD.
_GNU_SOURCE only seems to be included in llvm-config output when
LLVM is built via autoconf and not when it is built by cmake.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Superseded by HAVE_LOADER_GALLIUM. The latter has a *DRM* brethren
making the whose easier on which one to keep.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Just drop the conditional and simplify our build. This means that
it'll build every time, but it does not require any dependencies nor
does it take that long to compile 200 lines of boilerplate code.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The variable was unused and gave false information. The need for nonnull
winsys currently does not relate as it used to. Nowadays one can mix and
match more freely with plenty of winsys' to make your head spin.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With recent commit we removed the NEED_NONNULL_WINSYS checks when
selecting the hardware (inc svga) winsys. svga has only one winsys
that explicitly requires libdrm (via it's bundled version of
vmwgfx_drm.h) but configure.ac never really checks for it.
Add the check early to prevent people from shooting themselves when
they select the driver but lack libdrm.
$ ./autogen.sh --disable-dri --disable-egl --disable-gallium-llvm
--with-dri-drivers=swrast --with-gallium-drivers=svga,swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82539
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The rest of stencil handling isn't done yet, but it documents an extra
cl_u8(0) and helps make it obvious why we don't need to format clear_depth
the same way the depth/stencil buffer is formatted.
For now it still requires the color buffer to be present -- we're relying
on the store of color buffer contents to end the frame, and we have to do
something with color buffers in the rendering config packet.
According to GLSL-ES Spec(i.e. 1.0, 3.0), gl_Position value is undefined
after the vertex processing stage if we don't write gl_Position. However,
GLSL 1.10 Spec mentions that writing to gl_Position is mandatory. In case
of GLSL-ES, it's not an error and atleast the linking should pass.
Currently, Mesa throws an linker error in case we dont write to gl_position
and Version is less then 140(GLSL) and 300(GLSL-ES). This patch changes
it so that we don't report an error in case of GLSL-ES.
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83380
Similar to the changes to GEN7 command functions, but to GEN6 this time.
As every GPE function has been converted, remove
ilo_cp_assert_no_implicit_flush() calls.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_state_write()
ilo_cp_steal_ptr() -> ilo_builder_state_pointer()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_steal_ptr() and memcpy() -> ilo_builder_surface_write()
ilo_cp_steal() and ilo_cp_write() -> ilo_builder_surface_write()
ilo_cp_write_bo() -> ilo_builder_surface_reloc()
and use this chance to drop the "_emit_" infix.
Make these changes
ilo_cp_begin() -> ilo_builder_batch_pointer()
ilo_cp_write() -> direct memory set
ilo_cp_write_bo() -> ilo_builder_batch_reloc()
and make sure there is no implicit flush. Use this chance to drop the
"_emit_" infix.
Remove instruction buffer management from ilo_3d and adapt ilo_shader_cache to
upload kernels to ilo_builder. To be able to do that, we also let ilo_builder
manage STATE_BASE_ADDRESS.
Comparing to how we manage batch and instruction buffers, the new builder
- does not flush
- manages both types of buffers
- manages STATE_BASE_ADDRESS
- uploads kernels using unsynchronized mapping
- has its own decoder for the buffers
- provides more helpers
Do not require a toy_compiler so that it can be used in other places, such as
state dumping. Add a bool to control whether the raw instruction words are
shown.
This would just crash. Noticed by accident while checking int divisions by zero
with a quickly hacked piglit test.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Forgot to add it when I fixed up the start instance handling in (llvm) draw.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Calling the variable min when it's really max and vice versa seems a bit
confusing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
UBO loads can be boolean-valued expressions, too, so we need to handle
them in emit_bool_to_cond_code() and emit_if_gen6().
However, unlike most expressions, it doesn't make sense to evaluate
their operands, then do something with the results. We just want to
evaluate the UBO load as a whole---which performs the read from
memory---then load the boolean result into the flag register.
Instead of adding code to handle it, we can simply bypass the
ir_expression handling, and fall through to the default code, which will
do exactly that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83468
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Matt and I believe that Sandybridge actually uses 0xFFFFFFFF for a
"true" comparison result, similar to Ivybridge. This matches the
internal documentation, and empirical results, but contradicts the PRM.
So, the comment is inaccurate, and we can actually just handle these
directly without ever needing to fall through to the condition code
path.
Also, the vec4 backend has always done it this way, and has apparently
been working fine. This patch makes the FS backend match the vec4
backend's behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added ir_triop_csel, and forgot again
when adding it to emit_bool_to_cond_code.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
As long as we don't have a workaround for frame based
decoding in VDPAU we should not advertise NV_vdpau_interop.
v2: fix commit message, check if get_video_param is present
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Instead we cast backend_visitor::prog for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes use of Altivec pack intrinsics on little-endian PowerPC
systems. Since little-endian operation only affects the load and store
instructions, the semantics of pack (and other) instructions that take
two input vectors implicitly change: the pack instructions still fill
a register placing values from the first operand into the "high" parts
of the register, and values from the second operand into the "low" parts
of the register, but since vector loads and stores perform an endian swap,
the high parts end up at high memory addresses.
To still achieve the desired effect, we have to swap the two inputs to
the pack instruction on little-endian systems. This is done automatically
by the back-end for instructions generated by LLVM, but needs to be done
manually when emitting intrisincs (which still result in that instruction
being emitted directly).
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Maarten Lankhorst <dev@mblankhorst.nl>
Instead we store a void pointer to the key, and cast it to
brw_wm_prog_key for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead we store a brw_stage_prog_data pointer, and cast it to
brw_wm_prog_data for fragment shader specific code paths.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BitSet::allocate() is being used with the expectation that it would
leave the bitfield untouched if its size hasn't changed, however,
the function always zeroed the last word, which led to obscure bugs
with live set computation.
This also fixes BitSet::resize(), which was broken, but luckily not
being used.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This hasn't been enabled in a long time and is completely stale and
unnecessary. Remove, esp since it doesn't build.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uniform values are in the UNIFORM register file, not the GRF register file.
Looking in virtual_grf_sizes makes no sense and only makes the output of
dump_instructions confusing.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript, README and trace.xsl
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
- include the headers' README & svga_dump.py
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android & scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & README
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the android buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & LLVM note
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & custom include
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript & the tests
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the android buildscript
v2: Don't double-include the compiler sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
v2: Don't double include the test sources.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
- include all headers in Makefile.sources
- sort the list(s)
- bundle the scons buildscript
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Commit 60d772cd9d1(st/vega: add headers and SConscript in
the tarball) meant to pick all the headers to be included in
the release tarball yet it missed a few.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
A few files were missing, namely:
- a few of the common headers
- the android + gdi sources
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
With the last revisions of commit 664c2d76947(gallium/ilo: cleanup
intel_winsys.h) we moved the header from winsys to drivers, but we
forgot to update the makefile.sources to reflect this.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Currently, BLIT_MSAA_SHADER_2D_MULTISAMPLE_RESOLVE* and
BLIT_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_RESOLVE* shaders
in setup_glsl_msaa_blit_shader() are not recompiled
when the source buffer sample count changes. For example,
implementation continued using a 4X msaa shader, even if
source buffer changes from 4X msaa to 8x msaa. It causes
incorrect rendering.
This patch adds new enums in blit_msaa_shader, one for
each supported sample count, and uses them to store
msaa shaders.
Fixes following piglit tests on Broadwell:
ext_framebuffer_multisample-accuracy all_samples color
ext_framebuffer_multisample-accuracy all_samples depth_draw
ext_framebuffer_multisample-accuracy all_samples depth_resolve
ext_framebuffer_multisample-accuracy all_samples stencil_draw
ext_framebuffer_multisample-accuracy all_samples stencil_resolve
ext_framebuffer_multisample-formats all_samples
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstarnd <jason@jlekstrand.net>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libXvMC*, libvdpau* and libomx_mesa depends unconditionally
upon xcb, due to their usage of the aux/vl gallium module.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libEGL depends conditionally (when building with x11 platform)
upon xcb.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Make sure to check the presence of the module in order to pick the
correct libs flag and before feeding them to the compiler/linker.
Current libGL depends conditionally (when building with dri3) upon
xcb 1.9.3 and unconditionally on ancient xcb functions -
xcb_generate_id and xcb_request_check amongst others.
v2: Use PKG_CHECK_EXISTS() when checking for dri3 xcb.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80848
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When a texture is wrapped in a texture view, we can't trust the format in
the miptree itself. This patch allows us to pass the format seperately
through blorp so we can proprerly handled wrapped textures.
It's worth noting here that we can use the miptree format directly for
depth/stencil formats because they cannot be reinterpreted by a texture
view.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
CC: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Before commit 04895f5c we would only reswizzle dot product instructions
(since they wrote the same value into all channels, and we didn't have
to think about anything else). That commit extended reswizzling to cases
when the swizzle was single valued -- i.e., writing the same result into
all channels.
But allowing reswizzling of arbitrary things is actually really easy and
is even less code. (Why didn't we do this in the first place?!)
total instructions in shared programs: 4266079 -> 4261000 (-0.12%)
instructions in affected programs: 351933 -> 346854 (-1.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Despite the comment above the function claiming otherwise, the function
did not reswizzle sources, which would lead to bad code generation since
commit 04895f5c, which began claiming we could do such swizzling when we
could not.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82932
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The 'start' instruction is always in the current block, except for the
case of shader time, which emits code in a pattern seen no where else.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise, the basic block start/end IPs don't get updated properly,
leading to a broken CFG. This usually results in the following
assertion failure:
brw_fs_live_variables.cpp:141:
void brw::fs_live_variables::setup_def_use():
Assertion `ip == block->start_ip' failed.
Fixes KWin, WebGL demos, and a score of Piglit tests on Sandybridge and
earlier hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dump() methods don't alter the CFG or basic blocks, so we should
mark them as const. This lets you call them even if you have a const
cfg_t - which is the case in certain portions of the code (such as live
interval handling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As older versions of gnu ld did not support --dynamic-list check to see
if it is supported before using it. Non gnu linkers such the apple one
likely lack this option as well.
Fixes the build on OpenBSD which has binutils 2.15 and 2.17.
The --dynamic-list option seems to been have introduced sometime after
binutils 2.17 was released as it is present in 2.18.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fixes binary garbage in the compilation logs caused by
compat::string::c_str() not being null-terminated (which is a bug on
its own that will be fixed in another commit).
Reported-by: EdB <edb+mesa@sigluy.net>
Reverts
* "i965: Modify state upload to allow 2 different sets of state atoms."
8e27a4d2b3
* "i965: Modify dirty bit handling to support 2 pipelines."
373143ed91
* "i965: Create a macro for checking a dirty bit."
c5bdf9be1e
Conflicts:
src/mesa/drivers/dri/i965/brw_context.h
* "i965: Create a macro for setting all dirty bits."
6f56e1424d
Conflicts:
src/mesa/drivers/dri/i965/brw_blorp.cpp
src/mesa/drivers/dri/i965/brw_state_cache.c
src/mesa/drivers/dri/i965/brw_state_upload.c
* "i965: Create a macro for setting a dirty bit."
88e3d404da
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
We can't rely on the value from the assembler if relative addressing is
used. So instead use the max of declared-consts (which does not include
compiler immediates) and what we get from the assembler (which does).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
all_delayed will also be true if we didn't attempt to schedule anything
due to no more instructions using current addr/pred. We rely on coming
in to block_sched_undelayed() to detect and clean up when there are no
more uses of the current addr/pred, which isn't necessarily an error.
This fixes a regression introduced in b823abed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The split between these two didn't make much sense. I'm going to want the
chance to look at uniform contents in optimization passes, and the QPU
emit I think is going to end up rewriting the uniforms stream.
This common init routine can be used by constructors for multiple program
types.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Debugging a regression in discard support was just too full of duplicate
instructions, so I decided to remove them instead of re-analyzing each of
them as I dumped their outputs in simulation.
There were troubles with bools without using native integers
(st_glsl_to_tgsi seemed to think bool true was 1.0f sometimes, when as a
uniform it's stored as ~0), and since I've got native integers other than
divide, I might as well just support them.
Before, we had some special opcodes like CMP and SNE that emitted multiple
instructions. Now, we reduce those operations significantly, giving
optimization more to look at for reducing redundant operations.
The downside is that QOP_SF is pretty special -- we're going to have to
track it separately when we're doing instruction scheduling, and we want
to peephole it into the instruction generating the destination write in
most cases (and not allocate the destination reg, probably. Unless it's
used for some other purpose, as well).
A bool is 0 or ~0, and KILL_IF takes a float arg that's <0 for discard or
>= 0 for not. By negating it, we ended up doing a floating point subtract
of (0 - ~0), which ended up as an inf. To make this actually work, we
need to convert the bool to a float.
Reviewed-by: Brian Paul <brianp@vmware.com>
While similar in layout, the size of the SVGA3dSize type may be smaller than
the struct drm_vmw_size type that is part of the ioctl interface. The kernel
driver could accordingly overwrite a memory area following the size variable
on the stack. Typically that would be another local variable, causing
breakage in, for example, ubuntu 12.04.5 where the handle local variable
becomes overwritten.
v2: Fix whitespace errors
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Cc: "10.1 10.2 10.3" <mesa-stable@lists.freedesktop.org>
As explained in the previous commit, we want to avoid the possibility of
integer-multiplication overflow while allocating buffers.
In these two cases, the final allocation size is the product of three values:
one variable and two that are fixed constants at compile time.
In this commit, we move the explicit multiplication to involve only the
compile-time constants, preventing any overflow from that multiplication, (and
allowing calloc to catch any potential overflow from the remainining implicit
multiplication).
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit 32f2fd1c5d, several calls to
_mesa_calloc(x) were replaced with calls to calloc(1, x). This is strictly
equivalent to what the code was doing previously.
But for cases where "x" involves multiplication, now that we are explicitly
using the two-argument calloc, we can do one step better and replace:
calloc(1, A * B);
with:
calloc(A, B);
The advantage of the latter is that calloc will detect any overflow that would
have resulted from the multiplication and will fail the allocation, (whereas
the former would return a small allocation). So this fix can change
potentially exploitable buffer overruns into segmentation faults.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's been altering the tree and reporting "false" since January 2011.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, opt_copy_propagation_elements would always rewrite the
instruction stream, even if was the same thing as before. In order to
report progress correctly, we'll need to bail if the suggested
replacement is identical (or equivalent) to the original code.
This also introduced unnecessary noop swizzles, as far as I can tell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, if chans < 4, we passed uninitialized stack garbage to the
ir_swizzle constructor for the excess components. Thankfully, it
ignores that data, as it's unnecessary, so no harm actually comes of it.
However, it's obviously better to initialize it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_triop_csel can return a boolean expression, so we need to handle it
here; we simply forgot when we added it.
Fixes Piglit's EXT_shader_integer_mix/{vs,fs}-mix-if-bool.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
All shader stages have these fields, so it makes sense to store them in
the common base structure, rather than duplicating them in each.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 46d03d37bf I renamed a Makefile target
from md5 to checksums, (as we switched from MD5 checksums to SHA-256
checksums, so the more general name is more future proof).
But that commit missed one mention of "md5" as a dependency of the .PHONY
target. Rename that here as well.
According to the GLSL 1.40 spec, section 5.7 Structure and Array Operations:
"Array elements are accessed using an expression whose type is int or uint."
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The default case was accidentally clearing RADEON_FLAG_CPU_ACCESS from the
previous fall-through cases.
Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Coverity pointed out we never dropped the lock here, so fix
it by using a common exit path.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tex-miplevel-selection was hammering my memory manager with primconverts
on individual quads. This gets all those converted IBs packed into larger
IBs.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is part of fixing extremely long runtimes on some piglit tests that
involve streaming vertex reuploads due to format conversions, and will
similarly be important for X performance, which relies on these flags.
A meta begin/end pair with MESA_META_DRAW_BUFFERS will change visible GL
state. We recreate the draw buffer enums from the buffer bitfield, which
changes GL_BACK to GL_BACK_LEFT (and GL_FRONT to GL_FRONT_LEFT).
This commit modifes the save/restore logic to instead copy the buffer enums
from the gl_framebuffer and then set them on restore using
_mesa_drawbuffers().
It's not clear how this breaks the benchmark in 82796, but fixing meta to not
leak the state change fixes the regression.
No piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=82796
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
Coverity reported this, and I think this is the right solution,
since cache->items is struct cache_item ** not struct cache_item *,
we also realloc it using struct cache_item * at some point.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not all of these are used in every context, so this can make a
significant difference for short-lived contexts such as in piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we fails in reserve_explicit_locations, we leak uniform_map.
Reported-by: coverity scanner.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This one path doesn't goto fail, so it seems to leak dec.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.
This fixes retrieving chroma plane with VDPAU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.
Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.
This fixes the OSD in mplayer for VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.
Tested-by: Christian Ruppert <idl0r@qasl.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.
This fixes a hang in Brutal Legend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
It's almost the same.
This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).
2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
We were only using it to get at its type, which we already know because
it's a builtin variable.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were only using it to get at its type, which we already know because
it's a builtin variable.
v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.
Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v3: Since the fs backend can emit saturate as a separate instruction, there is
no need to detect for min/max instructions and to rewrite the instruction tree
accordingly. On the other hand, we don't need to emit a separate saturated
mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: - Syntax clarifications in inst->saturate assignment
- Remove extra parenthesis when assigning src_reg value
from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: Syntax clarifications in inst->saturate assignment (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
inner constant is 1.0.
- Fix comments to show that the optimization is a commutative operation
(Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
inner constant is < 1
- Fix comments to reflect we are doing a commutative operation (Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
- Add missing condition where the inner constant is 1.0 and outer constant is 0.0
- Make indexing of operands easier to read (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
unsigned int x_scaledown, y_scaledown;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some cases where the scheduler can get itself into impossible
situations, by scheduling the wrong write to pred or addr register
first. (Ie. it could end up being unable to schedule any instruction if
some instruction which depends on the current addr/reg value also
depends on another addr/reg value.)
To solve this we'd need to be able to insert extra mov instructions
(which would also help when register assignment gets into impossible
situations). To do that, we'd need to move the nop padding from sched
into legalize.
But to start with, just detect when we get into an impossible situation
and bail, rather than sitting forever in an infinite loop. This way it
will at least fall back to the old compiler, which might even work if
you are lucky.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All of the GL image enums fit in 16-bits.
Also move the fields from the anonymous "image" structucture to the next
higher structure. This will enable packing the bits with the other
bitfield.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0
Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0
A real savings of 346KiB on 32-bit and 262KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only values allowed are 0 and 1, and the value is checked before
assigning.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just use ir_variable::data.binding... because that's the where the
binding is stored for everything else that can use layout(binding=).
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VertexProgram and FragmentProgram have a Cache member for dealing
with fixed function programs. There are no fixed function geometry
programs, so this should never have existed, and was just copy and
pasted.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I actually screwed that up in 754319490f,
mistakenly thinking the code actually wanted the non-nan result before.
So, introduce that missing nan behavior case and use that instead.
For sse, there's no actual change in the resulting code at all, the fallback
code wouldn't have done the right thing though.
Of course, the actual issue I saw with pow() was completely unrelated...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty trivial, just fill in the offsets and such. The implementation
is near 100% copy and paste from llvmpipe. Should be useful for debugging.
No piglit change when not using SOFTPIPE_USE_LLVM=1.
Now that it can do the same tests with and without using llvm for vs/gs,
with llvm more pass, the only things failing only with llvm seems to be
edgeflags tests and vs/gs-pow-float-float (and for the latter I'm not
convinced the zero tolerance it requires is somehow mandated by glsl).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The code is all in place now so enable it.
Seems to pass all relevant piglit tests (just like cube maps, some of the
cube map array tests need GALLIVM_DEBUG=no_quad_lod,no_rho_approx)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE
also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation
to the calculated face.
Also handle it for texture size query, looks like OpenGL wants the number
of cubes, not layers (so need division by 6).
No piglit regressions.
v2: fix up adding cube layer to face for seamless filtering (needs to happen
after calculating per-sample face). Undetected by piglit unfortunately.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Not sure why it was there but it is definitely not an error if gs outputs are
infs/nans. Besides, the outputs can be ints, in which case any small negative
number asserted.
This fixes piglit's texelFetch gs isamplerXX crashes with softpipe (down from
14 to 2).
Bug https://bugs.freedesktop.org/show_bug.cgi?id=80012
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
piglit tex-miplevel-selection nowadays doesn't use repeat wrap mode due to
sampler objects any longer, however at the time of the clear the wrap mode
is still illegal and at this point we get to verify the state, including
samplers (even though they won't get used), and because mesa doesn't treat
it as an incomplete texture as the spec says it should, we hit the assertion.
Just warn about this for now instead.
Gets crashes down from 44 to 14 in a piglit run (all were in various tests of
tex-miplevel-selection with texture rectangles). Though just about all
tex-miplevel-selection tests fail anyway for other reasons.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just handle as ordinary 2d / 2d array resources. Prevents an assertion failure
with softpipe and piglit glsl-resource-not-bound 2DMS/2DMSArray tests.
While here also fix TXD shadowCube similarly, which fixes the crash with piglit
tex-miplevel-selection textureGrad CubeShadow (the test will still fail due to
softpipe being broken).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=80011
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was meant for softpipe to not crash at some point if vertex texturing
was used. It is, however, fishy because it uses values from
draw_set_samplers/draw_set_sampler_views and not from the shader key. Albeit
we should still in all cases actually generate a new shader if this changes
(because the samplers and views themselves are in the key) I don't want to
think again wondering if that's really correct in the future.
Besides, at least today, it does not actually work for softpipe, as this was
relying on softpipe not actually calling draw_set_samplers/sampler_views at
all - I've verified it crashes regardless (if there were a tex instruction in
the vs, which normally should not happen anyway). For drivers which do indeed
not call these functions because they don't support vertex texturing at all
(r300), this should still not crash because the static texture data is all
zero, which causes the sampling functions to take an early out (same as is done
if no texture is bound at the slot used for sampling - verified with hacked up
softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
mesa was creating a cube map array texture with just one layer, which is
not legal. This caused an assertion failure when using that texture later
in llvmpipe (when enabling cube map arrays) since it verifies the number
of layers in the view is divisible by 6 (the sampling code might well crash
randomly otherwise) with piglit glsl-resource-not-bound CubeArray -fbo -auto.
v2: use appropriately sized texel array...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
The source can be a register as well as an immediate, and disassembling
a register as an immediate can have some strange results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is meant to be private within the actual winsys. Remove it from
the exported header, and fold it into it's only user.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This define is always set and it had no real purpose according to
git log. Seems like it is a leftover from the vl/vdpau prototype
stage.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Namely
- the private header (xa_priv.h)
- README and
- xa-indent
Sort the sources list while we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 6a19bb56e0.
The above commit disabled the default build of xvmc as the xvmc tests
were failing. As pointed out by Ilia, the tests are "broken by design"
as they do not test the object that is build but the one that is
installed and setup on the workstation.
With previous commit we moved the programs from the 'make check' to
noinst automake target. This way they won't be run but will be around
for people to use them.
Cc: Tom Stellard <thomas.stellard@amd.com>
All the tests require an installed and setup XvMC, thus they
are not good candidates for 'make check'.
Keep them around as the user might want to actually test the
implementation post installation/setup.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the final remaining files into the tarball (make dist), namely:
- SConscripts
- Non-autotooled winsys' - android, gdi and hgl.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Include the headers within.
- Update scons to use them.
- Drop useless include (gallium/drivers) from scons.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Drop duplicate include compiler directives.
- Leave the sw/ prefix for all the software winsys headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add top_srcdir/src/gallium/winsys to GALLIUM_DRIVER_C{XXFLAGS}.
- Remove top_srcdir/src/gallium/drivers/radeon from the includes.
As a result:
- Common radeon headers are prefixed with 'radeon/'
- Winsys header inclusion is prefixed 'radeon/drm'
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Make the header location, inclusion and contents more common with
its i915,r* and nouveau counterparts:
- Move the header within drivers/ilo.
- Separate out intel_winsys_create_for_fd into 'drm_public' header.
- Cleanup the compiler includes.
v2: Move the header to drivers/ilo. Suggested by Chia-I.
v3: Correct intel_winsys.h inclusion. Spotted by Chia-I.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
V2: moved test for the VertexAttrib*Pointer() functions
to update_array(), and made constant available for drivers to set
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The base instance needs to be passed to the jited function, otherwise the
instanced data fetch will only work with the same start instance when the
jit function was created (and baking that into the key instead is not a viable
option).
This fixes piglit arb_base_instance-drawarrays (modulo some unrelated
core/compat context trouble I get for the test).
And fix the pipe cap bit in llvmpipe for it now that it actually works (it
already worked for softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The docs were never really up to date for them, missing just about everything.
So mark them off as all done for GL 3.3 (though softpipe is in fact quite
broken for some newer things especially wrt texturing, and both don't have
compliant, real msaa support). And add the extensions missing too (no
guarantee of completeness).
Reviewed-by: Dave Airlie <airlied@gmail.com>
* IEEE Std 1003.1-2001 placed strcasecmp() in strings.h.
* ISO C99 doesn't mention strcase* in string.h
* On all platforms I could find, strcasecmp is in strings.h and string.h
as a compatibility layer for software written pre-2001 POSIX
* Technically strcasecmp should be only in strings.h and the man
pages back this up.
* Tested build on CentOS and Haiku
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is never used. There is another token "OUTPUT" which the lexer can
generate, though. This has been around since the dawn of time; is most
likely a typo.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Now that qpu_inst() ignores the WADDR from the other half of the
instruction, we can set both the ADD and MUL WADDRs in the NOP helper.
Thanks to that, we also no longer need to qpu_inst(NOP, NOP).
This allows setting the opposite-side WADDR to NOP (a non-zero value) in
qpu_* helpers, so that we don't need to qpu_inst() merge them with NOPs
all the time just to get the waddr set.
Fixes piglit draw-vertices and gl-2.0-vertexattribpointer on vc4, where
I'm only advertising R32F to RGBA32F support so far.
Note: regresses gl-1.5-normal3b3s-invariance due to introduced flushes and
missing depth buffer load/store support in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Individual caps made supporting new fallbacks more complicated than it
needed to be. Instead, just make a table of fallbacks at context init
time.
v2: Fix inverted "do we need to install vbuf?" flagging caught by Marek.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
When I made the shader cache take the .fs member and moved the binding
point to .bind_fs, I failed to update these. Fixes crashes in
copyteximage-related tests.
The patch fixes the build on Oracle Solaris.
CC os/os_misc.lo
"os/os_misc.c", line 59: #error: unexpected platform in os_sysinfo.c
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We've found that there's a buffer overrun bug in flex that's triggered by
using alternation in a lookahead pattern.
Fortunately, we don't need to match the exact {NEWLINE} expression to
detect an empty pragma. It suffices to verify that there are no non-space
characters before any newline character. So we can use a simple [\r\n] to
get the desired behavior while avoiding the flex bug.
Fixes the regression of piglit's 17000-consecutive-chars-identifier test,
(which has been crashing since commit
04e40fd337 ).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82472
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CC: <mesa-stable@lists.freedesktop.org>
The optimization relies on CMP setting the destination to 0, which is
equivalent to 0.0f. However, early platforms only set the least
significant byte, leaving the other bits undefined. So, we must disable
the optimization on those platforms.
Oddly, Sandybridge wasn't reported as broken. The PRM states that it
only sets the LSB, but the internal documentation says that it follows
the IVB behavior. Since it wasn't reported as broken, we believe it
really does follow the IVB behavior.
v2: Allow the optimization on Sandybridge (requested by Matt).
+32 piglits on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?=79963
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Operating on this code,
B0: ...
cmp.ne.f0(8)
(+f0) if(8)
B1: break(8)
B2: endif(8)
We can delete B2 without attempting to merge any blocks, since the
break/continue instruction necessarily ends the previous block.
After deleting the if instruction, we attempt to merge blocks B0 and B1.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This pass deletes an IF/ELSE/ENDIF or IF/ENDIF sequence, or the ELSE in
an ELSE/ENDIF sequence.
In the typical case (where IF and ENDIF) aren't the only instructions in
their basic blocks, we can simply remove the instructions (implicitly
deleting the block containing only the ELSE), and attempt to merge
blocks B0 and B2 together.
B0: ...
(+f0) if(8)
B1: else(8)
B2: endif(8)
...
If the IF or ENDIF instructions are the only instructions in their
respective basic blocks (which are deleted by the removal of the
instructions), we'll want to instead merge the next blocks.
Both B0 and B2 are possibly removed by the removal of if & endif.
Same situation for if/endif. E.g., in the following example we'd remove
blocks B1 and B2, and then attempt to combine B0 and B3.
B0: ...
B1: (+f0) if(8)
B2: endif(8)
B3: ...
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
... rather than pointing directly to the associated instruction. This
will let us set the block containing the IF statement's else-pointer to
NULL, when we delete a useless ELSE instruction, as in the case
(+f0) if(8)
...
else(8)
endif(8)
Also, remove the pointer to the ENDIF, since it's unused, and it was
also potentially wrong, in the case of a basic block containing both an
ENDIF and an IF instruction:
endif(8)
cmp.ne.f0(8) ...
(+f0) if(8)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
To avoid invalidating and recreating the control flow graph. Also stop
invalidating the CFG in places we didn't add or remove an instruction.
cfg calculations: 202951 -> 80307 (-60.43%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Will let us avoid invalidating the CFG if the optimization pass has
removed instructions using the new basic block methods.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fixes piglit glsl-fs-discard-01 and -03, and allows a lot of mesa demos to
start running. glsl-fs-discard-02 has a problem where the first tile is
not getting stored on the first render.
This should improve performance on real hardware by allowing more shader
instances to run in parallel. It also fixes assertion failures in tests
that don't emit a fragment color, since otherwise we didn't have enough
instructions to fit our signals in.
The spec citation talked about A and B, and I proceeded to pay no
attention to whether the waddrs were for A or B. As a result, this pair
of instructions would claim to conflict:
mov ra4, ra4 ; nop nop, r0, r0
mov.ns ra4, rb4 ; nop nop, r0, r0
Now that tiling is in place, we can expose the other formats. Depth is
still broken (need to make changes in the shader), but if you don't expose
it things crash all over. SNORM is dropped, but we could re-add it later
with some shader fixes to handle converting between [0,1] and [-1,1].
This still treats everything as RGBA8888 for the most part, same as
before. This is a prerequisite for handling other texture formats, since
only RGBA8888 has a raster-layout mode.
It meant that LUMALPHA was being marked as *many* miplevels, and
unsurprisingly wouldn't validate. On the other hand, some miplevel counts
wouldn't get the small mips validated at all.
The header is used by DRI1 drivers, which we've removed a while
back. Now only the dri1 loader in libGL is using it, so let's
move it in src/glx, and prefix it accordingly.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This flag was set to true for the atomic counter intrinsics, but it
never got plumbed through the linker, so by the time it got to the
backends it would always be set to the false. The current i965 backend
code doesn't use is_intrinsic, so this should not change any existing
code, but it's useful for codepaths that want to distinguish between
intrinsics and non-intrinsics without using strcmp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
This captures all of the steps I have been following in making releases for
the past year or so. This way, the instructions should be sound for anyone who
would like to take over the release process going forward.
This change will double cache size for branches which have a lower
LP_MAX_SHADER_VARIANTS limit (it will not do anything on master).
The reason is that nowadays shaders tend to be quite a bit larger than they
were (they were big when llvmpipe didn't have a fs loop, got much smaller with
that loop, and since then have gradually increased quite a bit though still
smaller than without the fs loop for various reasons - among them being d3d10
compliance, usage of 8-wide vectors, non-swizzled blend code). Thus effectively
less shaders would be cached (unless they were very small and the variant limit
was hit first). Also, since we're getting rid of the IR nowadays, the cached
shaders shouldn't need all that much memory actually.
This captures the set of rules I have been using for stable-branch management,
(starting with a discussion on the mesa-dev mailing list on July 2013, and
then refined through my own experience of performing stable-branch releases
since then).
We switched to these several stable releases ago, (since the MD5 algorithm has
been broken for some time), but only now did I get around to fixing this in
the Makefile rather than just performing this step manually.
CC: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
v2:
- Move dri*_query_renderer_* into their respective dri*_priv.h headers
- Drop then unnneeded include of dri2.h from dri2_query_renderer.c
- Rename dri2_query_renderer.c as dri_common_query_renderer.c, as it's contents
now are used for more than dri[23]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
If only the flat/smooth shade state changed between
two render calls the prior code would miss updating the
hardware state.
Also add check for sprite coord, potentially same type
of issue otherwise for it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81967
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
cs_vertex_buffer_state.enabled_mask and
cs_vertex_buffer_state.dirty_mask are both updated when
r600_set_constant_buffer() is called, so we don't need to manually
update these values.
This fixes a crash with OpenCL programs that have a kernel with no
arguments.
https://bugs.freedesktop.org/show_bug.cgi?id=82671
CC: "10.2" <mesa-stable@lists.freedesktop.org>
Unlocking the texture is not safe: another thread could come in and grab
it. Now that we use a recursive mutex, this should work. This also fixes
texture lock deadlocks in the new meta fast clear path.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Chris Forbes <chrisf@ijw.co.nz>
This avoids problems with things like meta operations calling functions
that want to take the lock while the lock is already held. Basically,
the point is to guard against API reentrancy across threads...not to
guard against ourselves.
Dave Airlie opposed this change, but it makes master usable again and no
one proposed a better solution. We can revert this if/when someone
does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Chris Forbes <chrisf@ijw.co.nz>
There were two problems with the way this script used sed on OS X:
1. The OS X sed doesn't interpret "\r" in a replacement list as a
carriage-return character, (instead it was inserting a literal
'r' character).
We fix this by putting an actual ^M character into the source of
the script, (rather than a two-character escape sequence hoping
for sed to do the right thing).
2. When generating the test files with LF-CR ("\n\r") newlines, the
OS X sed was adding an undesired final newline ("\n") at the end
of the file. We avoid this by first using sed to add the ^M
before the newlines, then using tr to swap the \r and \n
characters. This way, sed never sees any lines ending with
anything but \n, so it doesn't get confused and doesn't add any
bogus extra newlines.
Tested-by: Vinson Lee <vlee@freedesktop.org>
Vinson's testing confirmed that this patch fixes FreeBSD as well.
I noticed that with /bin/sh on Mac OS X, "echo -n" does not work as
desired, (it actually prints "-n" rather than suppressing the final
newline). There is a /bin/echo that could be used (it actually works)
instead of the builtin echo.
But I decided it's more robust to just use printf rather than
hardcoding /bin/echo into the script.
This LLVM 3.6 commit changed EngineBuilder constructor.
commit 3f4ed32b4398eaf4fe0080d8001ba01e6c2f43c8
Author: Rafael Espindola <rafael.espindola@gmail.com>
Date: Tue Aug 19 04:04:25 2014 +0000
Make it explicit that ExecutionEngine takes ownership of the modules.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215967 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
The docs say "When performing a render target resolve, PIPE_CONTROL with end
of pipe sync must be delivered.", which doesn't actually tell us whether we
need to do it before or after. Blorp did it before and after, and doing it
before certainly makes sense. The resolve operation needs to read from the
MCS and if we don't flush the render cache it won't get up-to-date data.
On the other hand, doing it after should not be necessary, since we call
brw_render_cache_set_check_flush() after the resolve.
Fixes rendering corruption in kwin's cover switch effect and various steam
games.
Missing flush spotted by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The extension requires GL 3.0, so enable on just the generations
exposing that.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
SB needs a bit of special handling to handle
instructions without obvious side effects, to
avoid it deleting them.
Fixes failing non-const ARB_gpu_shader5
textureOffsets piglits with sb enabled.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Based on the old code, the new layout code describes the layout with the new,
well-documented, ilo_layout. It also gains new features such as MCS support
and extended ARYSPC_LOD0 that i965 comes up with (see
6345a94a9b).
Instead create a staging texture with pipe_buffer_create and
PIPE_USAGE_STAGING.
u_upload_mgr sets the usage of its staging buffer to PIPE_USAGE_STREAM.
But since 150ac07b85 CPU -> GPU streaming buffers
are created in VRAM. Therefore the staging texture (in VRAM) does not offer any
performance improvements for buffer downloads.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
... and store the value in intel_winsys_info/ilo_dev_info.
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
olv: check for errors and report raw values
The loop over all instructions is now two-fold, over all of the blocks
and all of the instructions in each block.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The next patch adds a foreach_block (block, cfg) macro, which works
better if it provides a direct bblock_t pointer, rather than a
bblock_link pointer that you have to use to find the actual block.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Doesn't use fewer instructions, but it does avoid writing the flag
register and if we want to switch the representation of true for Gen4/5
in the future, we can just delete the AND instruction.
Dead since the call to _mesa_generate_parameters_list_for_uniforms
was removed in commit 12751ef2. So this was why all of that code that
was supposed to fix up the value of a uniform bool to wasn't happening.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The next patches are going to combine some of the mapi subdirectories'
Makefiles into a single Makefile, giving better build parallelism.
lib_LTLIBRARIES will be set to something like
lib_LTLIBRARIES = shared-glapi/libglapi.la es2api/libGLESv2.la
and the current code in install-lib-links.mk simply prepends .libs/ and
replaces the .la in order to create the filenames that it needs to ln/cp
into the LIBDIR. This doesn't work when the .la file is actually in a
subdirectory.
This patch fixes this and puts .libs/ in the right place.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The SWIZZLE_1 of the winsys destination was dereffing off the end of the
array, which surprisingly often worked out (since nobody reads the
rendered value anyway, so whatever junk was referenced in the QIR didn't
matter), but shader dumping would sometimes segfault.
We weren't accounting for the level 0 offset in the texture setup (so it
only worked if it happened to be a single-level texture), and doing so
required that we get the level 0 offset page aligned so that the offset
bits don't get interpreted as the texture format and such.
I had the right viewports in vc4_emit.c, but grabbed the wrong values in
the uniform setup, so primitives would claim to be in the wrong parts of
the screen. (The vc4_emit.c state looks like it just decides how big the
clipping guardband is).
This gets fbo-viewport closer to working (which still has the problem that
the HW is always guard-band clipping), and fixes inverted FBO rendering in
general.
The function should return GLboolean, not GLenum.
If we detect invalid compressed pixel storage parameters, we should
return GL_TRUE, not GL_FALSE so that the function is no-op'd.
An update to the piglit s3tc-errors test will check this.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Replace the gl_texture_image parameter with mesa_format since we only
used the image's format.
Add some comments.
Reviewed-by: Matt Turner <mattst88@gmail.com>
For gen6 we will use the ALL_SLICES_AT_EACH_LOD miptree layout for
separate stencil/hiz. This is needed because gen6 hiz and separate
stencil only support a single miplevel. When accessing the other LODs,
we will program a tile aligned offset for the bo.
PRM Volume 1, Part 1, 7.18.3.7.2 For separate stencil buffer [DevILK]
to [DevSNB]:
"The separate stencil buffer does not support mip mapping, thus the
storage for LODs other than LOD 0 is not needed."
We still allocate storage for the other stencil mip-levels within a
single texture, but each mip-level will use non-mip-array spacing.
PRM Volume 2, Part 1, 7.5.3 Hierarchical Depth Buffer
"[DevSNB]: The hierarchical depth buffer does not support the LOD
field, it is assumed by hardware to be zero. A separate
hierarachical depth buffer is required for each LOD used, and the
corresponding buffer’s state delivered to hardware each time a new
depth buffer state with modified LOD is delivered."
We allocate storage for the other hiz mip-levels within a single
texture, but each mip-level will use non-mip-array spacing.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since gen6 separate stencil & hiz only supports LOD0, we need to
program an offset to the LOD when emitting the separate stencil/hiz.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 doesn't support multiple miplevels for hiz and stencil.
Therefore, we must point to the LOD directly during rendering.
But, we also have removed the tile offsets from normal depth surfaces,
so we need to align each LOD to a tile boundary for hiz and stencil.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Previously array_layout ALL_SLICES_AT_EACH_LOD was only used for array
spacing lod0 on gen7+ and therefore was only used with a single mip
level.
gen6 separate stencil & hiz only support LOD0, so we need to allocate
the miptree similar to gen7+ array spacing lod0, except we also need
space for multiple mip levels. (Since OpenGL stencil and depth support
multiple LODs.)
The miptree is allocated with tightly packed array slice spacing, but
we still also pack the miplevels into the region similar to a normal
multi mip level packing.
A 2D Array texture with 2 slices and multiple LODs would look somewhat
like this:
+----------+
| |
| |
+----------+
| |
| |
+----------+
+---+ +-+
| | +-+
+---+ +-+
| | :
+---+
v3:
* Use new array_layout enum
* ASCII art!
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
gen6 does not support multiple miplevels with separate
stencil/hiz. Therefore we need to layout its miptree with no mipmap
spacing between the slices of each miplevel.
v3:
* Use new array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We will want to setup gen6 separate stencil and hiz miptrees in a
layout that is similar to array_spacing_lod0. This is needed because
gen6 hiz and stencil only support a single mip-level.
In both use cases (gen7+ LOD0 spacing & gen6 separate stencil/hiz),
the array slices will be packed at each LOD without reserving extra
space for LODs within each array slice.
So, we generalize the name of this field and add comments to indicate
the old and new uses.
Motivation for the gen6 change comes from the PRM:
PRM Volume 1, Part 1, 7.18.3.7.2 For separate stencil buffer [DevILK]
to [DevSNB]:
"The separate stencil buffer does not support mip mapping, thus the
storage for LODs other than LOD 0 is not needed."
PRM Volume 2, Part 1, 7.5.3 Hierarchical Depth Buffer
"[DevSNB]: The hierarchical depth buffer does not support the LOD
field, it is assumed by hardware to be zero. A separate
hierarachical depth buffer is required for each LOD used, and the
corresponding buffer’s state delivered to hardware each time a new
depth buffer state with modified LOD is delivered."
v2:
* Rename array_spacing_lod0 to non_mip_arrays
v3:
* Instead, replace array_spacing_lod0 with array_layout enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(bf25ee2 for gen6)
Previously we would always find the 2D sub-surface of interest,
and then program the surface to this location. Now we always
program the 3DSTATE_DEPTH_BUFFER at the start of the surface.
To select the lod/slice, we utilize the lod & minimum array
element fields.
We also must disable brw_workaround_depthstencil_alignment for
gen >= 6. Now the hardware will handle alignment when rendering
to additional slices/LODs.
v3:
* Set depth_mt bo RELOC offset to 0, as was done in bf25ee2
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56127
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
(bc1acaa for gen6)
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(171e633 for gen6)
This will be used in 3DSTATE_DEPTH_BUFFER in a later patch.
Note: Cube maps are treated as 2D arrays with 6 times as
many array elements as the cube map array would have.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We will program the gen6 hiz depth state differently to enable layered
rendering on gen6.
v2:
* Remove unneeded gen6_emit_depthbuffer as suggested by Topi
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the gen6 PRM Volume 1 Part 1: Graphics Core, Section
7.18.3.7.1 (Surface Arrays For all surfaces other than separate
stencil buffer):
"[DevSNB] Errata: Sampler MSAA Qpitch will be 4 greater than the
value calculated in the equation above , for every other odd Surface
Height starting from 1 i.e. 1,5,9,13"
Since this Qpitch errata only impacts the sampler, we have to adjust
the input for the rendering surface to achieve the same qpitch. For
the affected heights, we increment the height by 1 for the rendering
surface.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than pointing the surface_state directly at a single
sub-image of the texture for rendering, we now point the
surface_state at the top level of the texture, and configure
the surface_state as needed based on this.
v2:
* Use SET_FIELD as suggested by Topi
* Simplify min_array_element assignment as suggested by Topi
v3:
* Use irb->layer_count for depth instead of rb->Depth
* Make gl_target const
* depth - 1, not depth
v4:
* Merge in dd43900b & b875f39e fixes to prevent 3D texture piglit
regressions
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since this code was branched from brw_wm_surface_state.c, it had
support for gen < 6. We can now remove this.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Layered rendering is part of OpenGL 3.2; GL_ARB_draw_instanced is part
of OpenGL 3.1. As such, all drivers supporting layered rendering
already support gl_InstanceID.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Only assign gl_Layer if we have GL_AMD_vertex_shader_layer. Gen6 doesn't
(currently) have that extension, but it also doesn't support layered
rendering.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Passes blendminmax and blendsquare. glean's more serious blendFunc fails
in simulation due to binner memory overflow (I really need to work around
that), and fbo-blending-formats fails due to Mesa refusing one of the
getter requests, even before it could fail due to the driver not actually
supporting different formats yet.
The hw_mask is the set of primitives you actually support, so this attempt
to provide the set of formats that's unsupported was wrong in two ways (it
was intended to be '~' not '!'). However, we only call this code when
prim isn't one of the actually supported hw_mask bits, so missing out on
the memcpy didn't matter anyway.
We were triggering simulator assertion failures for not consuming these,
and presumably we want to actually make use of them some day (for things
like point/line antialiasing)
Note that this has the qreg index as 0, which is the same index as the
first GL varyings read. This doesn't matter currently, since that number
isn't used for anything except dumping.
At some point I'm going to want to move the information necessary for the
host buffer upload/download into the BO so that it's independent of the
current vc4->framebuffer, but for now this fixes pointless derefs on
non-simulator in vc4_context.c since the dump_fbo() removal
This patch uses the infrastructure put in place by previous patches
to implement fast color clears and replicated color clears in terms of
meta operations.
This works all the way back to gen7 where fast clear was introduced and
adds support for fast clear on gen8. It replaces the blorp path
completely and improves on a few cases. Layered clears are now done
using instanced rendering and multiple render-target clears use a
MRT shader with rep16 writes.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The data port has a SIMD16 'replicate data' message, which lets us write
the same color for all 16 pixels by sending the four floats in the
lower half of a register instead of sending 4 times 16 identical
component values in 8 registers.
The message comes with a lot of restrictions and could be made generally
useful by recognizing when those restriction are satisfied. For now,
this lets us enable the optimization when we know it's safe, but we don't
enable it by default. The optimization works for simple color clear shaders
only, but does recognized and support multiple render targets.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We'll use this in the i965 fast clear implementation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
GetTexParamterfv() doesnt change texture state, so instead of
_mesa_lock_texture() we can use _mesa_lock_context_textures(),
which doesn't increase the texture stamp. With this change,
_mesa_update_state_locked() is now only called from under
_mesa_lock_context_textures(), which is right thing to do. Right now
it's the same mutex, but if we made texture locking more fine grained
locking one day, just locking one texture here would be wrong.
This all ignores the fact that texture locking seem a bit
flaky and broken, but we're trying to not blatantly make it worse.
This change allows us to reliably unlock the context textures in the
dd::UpdateState callback as is necessary for meta color resolves.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change except for glBegin/glEnd style rendering, where we now
do the resolves at glBegin time instead of FLUSH_VERTICES time. This is also
the reason for this change, so that when we later switch fast clear resolve to
use meta, we won't be doing meta operations in the middle of a begin/end
sequence.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
GEN7+ has the fast clear functionality, which lets us clear the color
buffers using the MCS and a scaled down rectangle. To enable this
we have to set the appropriate bits in the 3DSTATE_PS package.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The clipper doesn't support clipping 3DPRIM_RECTLIST primitives and must
be turned off when we use them.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_draw_prims() function is the draw entry point into the driver,
and takes struct _mesa_prim for input. We want to be able to feed
native primitives into the driver, and to that end we introduce
BRW_PRIM_OFFSET, which lets use describe geometry using the native
GEN primitive types.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us disable the viewport transform, which will be useful for
emitting 3DPRIM_RECTLIST.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, this can only be triggered with a new 'no8' INTEL_DEBUG option
and a new context flag. We'll use the context flag later, but introducing
it now lets us bisect to this commit if it breaks something.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The final step to get GLX_MESA_query_renderer working with gallium
drivers.
v2: Remove __DRI2_RENDERER_PREFERRED_PROFILE handling. It's already
handled in dri/common. Spotted by Marek.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Both report 0xffffffff as both vendor and device id, and the maximum
amount of system memory as video memory.
v2: Use aux helper os_get_total_physical_memory().
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All the values are are currently hardcoded. One could use
some heuristics to determine the amount of video memory if
a callback to the host is not available.
Do we what to advertise the driver as hardwar accelerated ?
Cc: Brian Paul <brianp@vmware.com>
Cc: José Fonseca <jose.r.fonseca@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Implementation based on the classic driver with the following
changes:
- Use auxiliarry function os_get_total_physical_memory to get the
total amount of memory.
- Move the libdrm_intel specific get_aperture_size to the winsys.
Cc: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Implementation based on the classic driver with the following
changes:
- Use auxiliarry function os_get_total_physical_memory to get the
total amount of memory.
- Move the libdrm_intel specific get_aperture_size to the winsys.
Cc: Stephane Marchesin <stephane.marchesin@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Provide the real vendor and and hardcode the device id as
0xffffffff as the devices currently using freedreno are non-pci.
The device features UMA.
Cc: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Namely vendor/device id, accelerated and UMA, which will be used to describe
the underlying renderer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Drop __DRI2_RENDERER_PREFERRED_PROFILE case.
- Cleanup return statements.
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Create radeon{Vendor,GetRenderer}String helpers.
- Drop __DRI2_RENDERER_PREFERRED_PROFILE case.
- Cleanup return statements.
To be used by the upcomming GLX_MESA_query_renderer implementation.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Printing the TCL involves that context is available at the time of
query. The GLX_MESA_query_renderer states that glGetString(GL_RENDERER)
and glXQueryRendererStringMESA(GLX_RENDERER_DEVICE_ID_MESA) will have
the same format, thus removing the context dependenicy will help us
achieve that.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Create nouveau_{vendor,get_renderer}_string helpers.
- Set correct max_gl*version.
- Query the device PCIID via libdrm_nouveau/nouveau_getparam.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Essentially all drivers would like to use to opengl core profile if
available, so avoid duplication by moving the code to a common fallback
within driQueryRendererIntegerCommon.
If a driver uses different approach they can handle it separately.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The extension is used by GLX_MESA_query_renderer, which
can be provided for by hardware and software drivers.
v2: Use designated initializers.
v3: Move drisw_query_renderer_*() to dri2_query_renderer.c
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Generate a GL error and return rather than crashing on a null
ctx->Driver.CopyImageSubData pointer (gallium). This allows apitraces
with glCopyImageSubData() calls to continue rather than crash.
Plus, fix a comment typo.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to the problem described in 2c50212b14, if we copy the clear
value through a regular assignment via a floating point value, then if an
integer clear value is being used that happens to contain a signalling NaN
value then it would get converted to a quiet NaN when stored via the x87
floating-point registers. This would corrupt the integer value. Instead we
should use a memcpy to ensure the exact bit representation is preserved.
This bug can be triggered on 32-bit builds with optimisations by using an
integer clear color with a value like 0x7f817f81.
Reviewed-by: Matt Turner <mattst88@gmail.com>
V2: Set force_writemask_all on ADD; this *is* necessary in the VS case
too.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, assume that the addressed sampler can be in any of the
16-sampler banks. If we preserved range information this far, we
could avoid emitting these instructions if the sampler were known
to be contained within one bank.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to be using this infrastructure to build descriptors in
src1 of non-send instructions, when preparing to do an indirect send.
Don't accidentally clobber the conditionalmod field of those
instructions with SFID bits, which aren't part of the descriptor.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
This provides a reasonable place to enforce the hardware restriction
that indirect descriptors must be in a0.0
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
The pass breaks live ranges of virtual registers by allocating new
registers when it sees an assignment to a virtual GRF it's already seen
written.
total instructions in shared programs: 4337879 -> 4335014 (-0.07%)
instructions in affected programs: 343865 -> 341000 (-0.83%)
GAINED: 46
LOST: 1
[mattst88]: Make pass not break in presence of control flow.
invalidate_live_intervals() only if progress.
Fix up delta_x/delta_y.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Commit c66d928f2c ("i965: Enable INTDIV
in SIMD16 mode.") began using generate_math_gen6 to break SIMD16 INTDIV
into two SIMD8 operations.
generate_math_gen6 takes two registers - for unary operations, we pass
ARF null for the second operand. Prior to Broadwell, real operands were
always GRF. But now they can be IMM as well.
So, check for != ARF instead of == GRF.
+12 piglits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit af13cf609f, which
appears to cause huge performance problems on Ivybridge. I'd missed
that the FFTID bits are in the low byte. The documentation doesn't
indicate that the URB write message header actually wants FFTID - it
just labels those bits as "Reserved." But it appears necessary.
This does slightly more than revert the original change: originally,
Broadwell had separate code generation, which used MOV, and this patch
only changed it for Gen4-7. Now that both are unified, reverting this
also makes Broadwell use OR. Which should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The extension says GL 4.0 is required. We'll meet the spirit
of that restriction by enabling on just those generations which will
soon support GL 4.0 (Gen7+), although it's technically supportable on
all generations.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The quality level (fine/coarse/dont-care) is plumbed through to the
generator as a constant in src1.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The quadop-based method we currently use on all chipsets already
provides the fine version of the derivatives.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This extension is identical to NV_texture_barrier. Alias
glTextureBarrier to the existing glTextureBarrierNV and use the existing
NV_texture_barrier extension bit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This matches the name of the dd hook. Also convert a couple of nearby
dd implementations to lowercase + underscore as is now the standard.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now we decide which kernels to use and the GRF start offsets in
one place and emit the kernel pointers later. The logic of how to map
8, 16 and 32 kernels to kernel start pointers follows the same logic as which
GRF start offsets to use, so lets figure out these two things in one place.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The EGL_EXT_image_dma_buf_import specification was revised (according to
its revision history) on Dec 5th, 2013, for EGL to not take ownership of
the file descriptors.
Do not close the file descriptors passed in to eglCreateImageKHR with
EGL_LINUX_DMA_BUF_EXT target.
It is assumed, that the drivers, which ultimately process the file
descriptors, do not close or modify them in any way either. This avoids
the need to dup(), as it seems we would only need to just close the
dup'd file descriptors right after.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76188
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
gcc 4.6.3 chokes with the following error:
brw_vec4.cpp: In member function 'int brw::vec4_visitor::setup_uniforms(int)':
brw_vec4.cpp:1496:37: error: expected primary-expression before '.' token
Apparently C++ does not do named initializers for unions, except maybe
as a gcc extension, which is not present here.
As .f is the first element of the union, just drop it. Fixes the build
error.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* --enable-{32,64}-bit is done. Use --build and --host instead.
* Configure does not add "-g -O2" to C{,XX}FLAGS.
* Pkg-config has been mandatory for a while now.
* Avoid using LDFLAGS, refer to pkg-config.
* --with-expat is deprecated. Use pkg-config.
v2:
* Note that CC/CXX will need to be set for multilib builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
These two were added ages ago, with an explicit comment "Hacks ..."
They have been insufficient for years and maintainers needed to
explicitly handle the build themselves.
Rather than lying and pretending that it works, just kill this hack and
let maintainers build things the way it should be done for their
distribution.
Document the removal in the release notes.
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit 2af28040d6.
The commit was resolving an issue where libtool will not setup the
environment correctly when one explicitly provides --enable-{32,64}-bit
at configure time. It was caused due to the "-m32,64" C{,XX}FLAGS being
set too late relative to LT_INIT.
At the same time this cases the enable_static to be incorrectly set,
amongst others leading to build issues. Rather than being smart and
trying to handle 32/64 bit build ourselves it may be better to delegate
it to the builder/maintainer. The latter should now know better which is
the correct(most appropriate) method.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82536
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82546
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The brw_stage_prog_data struct previously contained an array of float pointers
to the values of parameters. These were then copied into a batch buffer to
upload the values using a regular assignment. However the float values were
also being overloaded to store integer values for integer uniforms. This can
break if x87 floating-point registers are used to do the assignment because
the fst instruction tries to fix up invalid float values. If an integer
constant happened to look like an invalid float value then it would get
altered when it was copied into the batch buffer.
This patch changes the pointers to be gl_constant_value instead so that the
assignment should end up copying without any alteration. This also makes it
more obvious that the values being stored here are overloaded for multiple
types.
There are some static asserts where the values are uploaded to ensure that the
size of gl_constant_value is the same as a float.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Requires GLSL 1.50 or higher, which we only support in the core profile.
V2: Fix broken alignment
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes the LUMINANCE_ALPHA formats of fbo-clear-formats piglit test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
All we need to do is decompose this to two SIMD8 instructions, like we
do in many other cases. We even already have code for that.
I apparently just botched this last time I tried, and it was easy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When dual source blending, the visitor already stores a flag in
brw_wm_prog_data (dual_src_blend) for the state upload code to use.
The generator also receives this, so there's no need to pass an
additional flag.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The _mesa_swizzle_and_convert path can't do transfer ops, so we should bail
if they're needed.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This just cherry-pick the extensions into a list for GLES 3.1
I'm not actually sure if this list if complete or correct, maybe someone
else can tell me what I missed, and I'm not 100% sure on multi_draw_indirect.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
==17630== Invalid read of size 4
==17630== at 0x400AE10: memcpy (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==17630== by 0x49024A2: u_upload_data (u_upload_mgr.c:253)
==17630== by 0x49050E1: u_vbuf_draw_vbo (u_vbuf.c:980)
==17630== by 0x487DE29: cso_draw_vbo (cso_context.c:1425)
==17630== by 0x487DEA0: cso_draw_arrays (cso_context.c:1445)
==17630== by 0x48A3B0E: hud_draw_colored_prims.constprop.6 (hud_context.c:123)
==17630== by 0x48A4810: hud_draw (hud_context.c:266)
==17630== by 0x48763F7: dri_flush (dri_drawable.c:483)
==17630== by 0x4057510: dri2Flush.constprop.4 (dri2_glx.c:559)
==17630== by 0x405789E: dri2SwapBuffers (dri2_glx.c:851)
==17630== by 0x402C531: glXSwapBuffers (glxcmds.c:842)
==17630== by 0x8049716: ??? (in /usr/bin/glxgears)
==17630== Address 0x4426b2c is 4 bytes after a block of size 1,008 alloc'd
==17630== at 0x4006B11: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==17630== by 0x48A4CE7: hud_pane_add_graph (hud_context.c:625)
==17630== by 0x48A68F0: hud_pipe_query_install (hud_driver_query.c:175)
==17630== by 0x48A6A30: hud_driver_query_install (hud_driver_query.c:207)
==17630== by 0x48A5835: hud_create (hud_context.c:791)
==17630== by 0x48756CB: dri_create_context (dri_context.c:165)
==17630== by 0x4871CD4: driCreateContextAttribs (dri_util.c:435)
==17630== by 0x4871E06: driCreateNewContext (dri_util.c:464)
==17630== by 0x4056A22: dri2_create_context (dri2_glx.c:223)
==17630== by 0x402CF68: CreateContext (glxcmds.c:299)
==17630== by 0x402D265: glXCreateContext (glxcmds.c:430)
==17630== by 0x804B136: ??? (in /usr/bin/glxgears)
This is due to second vertex element being specified, and the upload
tries to fetch over the end. However the pane rendering only requires
a single vertex element, so specify only one.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This got broken by 3dbf5bf657.
GL_COLOR_INDEX data is still supported (in legacy contexts), but the new
texstore_swizzle path cannot handle it (and didn't detect this).
Unfortunately there's no piglit test trying to specify textures with a
GL_COLOR_INDEX source format, and I don't really understand how all the color
map stuff which is used by this works, but this caused conform failures
(with a reported mesa implementation error when trying to figure out the color
mapping).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
accel_working2 returns 3 if the new firmware is used.
The comment wasn't updated in v3 of commit:
36771dc winsys/radeon: fix nop packet padding for hawaii
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Fixes piglit glean "do-loop with continue and break" on RS690
It's based on Tom Stellard patch and improved to handle CMP instruction.
[v2] handle CMP instruction
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: David Heidelberger <david.heidelberger@ixit.cz>
We were leaking the input buffer used for kernel arguments and since
we were allocating it using si_upload_const_buffer() we were leaking
1 MB per kernel invocation.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
This will decrement the reference count for buffers referenced in the
command stream will prevent us from leaking them.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
There is a hard limit in older kernels of 256 MB for buffer allocations,
so report this value as MAX_MEM_ALLOC_SIZE and adjust MAX_GLOBAL_SIZE
to statisfy requirements of OpenCL.
CC: "10.2" <mesa-stable@lists.freedesktop.org>
Before, when we encountered a situation where we had to optimistically
color a node, we would immediately give up and push all the remaining
nodes on the stack in the order of their index - which is a random, and
potentially not optimal, order. Instead, choose one node to
optimistically color in ra_select(), and then once we've optimistically
colored it, keep on going as normal in the hopes that we've opened up
more avenues for the normal select phase to make progress. In cases with
high register pressure, this helps make the order we push things on the
stack much better, and therefore increase the chance that we can allocate
successfully.
total instructions in shared programs: 4545447 -> 4545401 (-0.00%)
instructions in affected programs: 1353 -> 1307 (-3.40%)
GAINED: 124
LOST: 6
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we would consider any optimistically colored nodes for
spilling. However, spilling any optimistically colored nodes below the
node that we failed to color on the stack wouldn't help us make
progress, since it wouldn't help with allowing us to find a color for
the node currently failing to get colored. Only consider nodes
which were above the failing node on the stack for spilling, which
simplifies the logic, and comment the code better so people know what's
going on here.
No shader-db changes with BRW_MAX_GRF reduced to 90 (or with the normal
number of GRF's).
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can store the q total that pq_test() would've calculated in the node
itself, updating it when we add a node to the stack. This way, we only
have to walk the adjacency list when we push a node on the stack (i.e.
when the p, q test succeeds) instead of every time we do the p, q test.
No difference in shader-db run times, but I'm keeping this in because
the q total that it calculates will also be used in the next few commits.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, there were 3 entrypoints into parts of the actual allocator,
and an API called ra_allocate_no_spills() that called all 3. Nobody
would ever want to call any of the 3 entrypoints by themselves, so
everybody just used ra_allocate_no_spills(). So just make them static
functions, and while we're at it rename ra_allocate_no_spills() to
ra_allocate() since there's no equivalent "with spills," because the
backend is supposed to handle spilling.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This would try to allocate 0-sized bo's when the max level was below the
base level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The ARB_gpu_shader5 extension is made up of a lot of small sub-parts.
Instead of adding PIPE_CAP's for each of these, just rely on the GLSL
version reported by the pipe driver. The remaining extensions lend
themselves naturally to being checked through a single CAP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The file contains rules that are executed on incremental builds. This
way one can avoid doing a full clean and ensure that the new object
(library) is correctly build.
Inspired by the work of Chih-Wei Huang, from the Android-x86 project.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will make it easier on us as CleanSpec.mk comes along and improves
consistency across the Android build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will make it easier on us as CleanSpec.mk comes along and improves
consistency across the Android build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Saves us a few lines and brings us closer to the automake build.
Drop DRM_TOP as it's not longer used.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Trying to get rid of the hardcoded dependency of DRM_TOP which
expects that mesa is localted in /external/drm. Will
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The variable i915_C_FILES changed to i915_FILES with commit
34d4216e64 back in mesa 9.1/9.2. Yet we've missed to update the
the android build, essentially creating an dummy/empty driver that
can never work.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
When the compiler is not capable/does not accept -msse4.1 while the target
has the instruction set we'll blow up as _mesa_streaming_load_memcpy is
going to be undefined.
To make sure that never happens, wrap the runtime cpu check+caller in an
ifdef thus do not compile that hunk of the code.
Fix the android build by enabling the optimisation and adding the define
where applicable.
v2: autoconf conditionals end with "fi" rather than endif.
v3: Wrap the definition and call to intel_miptree_{un,}map_movntdqa in
if defined(USE_SSE41). Spotted by Matt.
Cc: Matt Turner <mattst88@gmail.com>
Cc: Adrian Negreanu <adrian.m.negreanu@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Upstream Android (system/core) has dropped these formats with commit
6bac41f1bf9(get rid of HAL pixelformats 5551 and 4444) yet does not
mention why.
These formats never really worked so we're safe to drop them as well.
Identical commit is available in the android-x86 external/mesa repo
commit 06a2d36edc
Author: Chih-Wei Huang <cwhuang@linux.org.tw>
Date: Wed Sep 25 01:16:57 2013 +0800
android: get rid of HAL pixelformats 5551 and 4444
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Recent versions of bionic has picked up support for these functions,
leading to build issues due to the redefition of the symbols.
Note: wrapping things in #ifdef does not cut it :\
Identical patch is available in chromium, android-x86 and perhaps other
projects.
commit 66c1c789ce3407472de9ed620c9f815639058835
Author: rmcilroy@chromium.org
Date: Wed Apr 02 10:59:34 2014 +0000
Porting to x64 Android. Remove redefinitions of log2 and log2f.
BUG=
R=kbr@chromium.org
Review URL: https://codereview.chromium.org/216773005
commit 9cc0a0d2b0
Author: Chih-Wei Huang <cwhuang@linux.org.tw>
Date: Sun Jul 21 23:04:19 2013 +0800
android: remove log2, log2f
The functions are already defined in the latest bionic.
Cc: Chia-I Wu <olvaffe@gmail.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Android build never really installs the headers, as such we need to
explicitly add their location in the source tree otherwise it will
fail to find them.
v2: Android now installs the headers, so let's use that ;)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Considering the way we've been consolidating things it makes
sense to add the final two (aux and tests) in here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Yet another makefile less to worry about.
v2: Add state_trackers and targets on a single SUBDIRS line.
Requested by Matt.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
One makefile less, with the potential of further compacting the
automake build.
v2: Rebase on top of vc4 changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Rather than having two separate almost empty and identical makefiles,
compact them thus improving the configure and build time.
Additionally this makes the automake build symmetrical to the scons
and android one.
v2: Rebase on top of vc4, compact drivers + winsys on a single line.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For all the people interested in testing the freedreno driver on
their Android devices. The next commit will hook these up within
the libEGL driver (via the gallium-egl backend).
There may be some rough edges but those can be sorted when a
willing builder/tester comes along.
v2:
- s/freefreno/freedreno/. Spotted by Matt Turner.
- Use the installed libdrm headers.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- link against libdrm_radeon
- link the r600 driver against libstlport
- linkin the newly added libmesa_pipe_radeon library
required by r600 and radeonsi drivers
v2: Include pipe_radeon after pipe_r600/radeonsi.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Split up and add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- include the correct folders
- add a new buildscript for the common radeon folder
v2: Use the installed libdrm headers over the DRM_TOP ones.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Split up and add commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
For a while the nouveau pipe driver has been a static library
and it has been using STL for even longer.
Correct add the link and cleanup the gallium_DRIVERS.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
nouveau uses STL for a while now thus we need to include
external/stlport/libstlport.mk in order to get the build
at least partially working.
v2: Use the installed libdrm headers over the DRM_TOP ones.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than having the sources list duplicated across all three
build systems, define it once and use it whenever needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A while back we've mandated that gbm requires enable_dri,
thus this check is no longer required.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
To avoid unresolved symbols in the DRI modules with earlier commit we
wrapped the innards of dri_kms_init_screen() in a
DRI_TARGET/GALLIUM_SOFTPIPE ifdef.
At the same time we forgot to adds the defines to the st/dri build
systems, breaking kms_swrast and gnome-continuous.
Drop the DRI_TARGET define, we're already in st/DRI.
Reported-by: Jasper St. Pierre <jstpierre@mecheye.net>
Reported-by: Vadim Rutkovsky <vrutkovs@redhat.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We should assert when either the function or the flag pointer
is null or we'll end up with a null reference a few lines later.
Currently unused by mesa thus it has gone unnoticed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Apps that only want to use core functionality should #include this
header. This version covers everything up to OpenGL 4.5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't actually do them (or even fake them) currently, but it does get
us a bunch of unrelated glean glsl1 tests passing, which previously would
error out due to glean assuming the minimums on a 3D texture that 2 of the
subtests use.
This can be useful for looking at context init setup and texture format
choices, and there's no reason for the silly retval computation we do if
you're not going to have this code (mostly from freedreno) around.
Everything should be in place to unify code generation between Gen4-7
and Gen8+. We should be able to drop the Gen8 generators at this point.
However, leave them hooked up for a brief moment, for testing and
comparison purposes. Set GEN8=1 to use the old Gen8+ code generator
paths.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kind of a moot point since we're deleting Gen8 code generation, but
this at least helps make it match the Gen4-7 code. It's probably more
reasonable than using float.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This improves performance in Trine 2 at 1280x720 (windowed) on "Very
High" settings by 30% (in the interactive menu) to 45% (in the forest
by the giant frog) on Haswell GT3e.
It also now generates the same assembly on Gen7 as it does on Gen8,
which always used the sampler for both types.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the documentation, we need to set the source 0 register
type to IMM for flow control instructinos that have both JIP and UIP.
Out of paranoia, just make all flow control instructions use IMM;
there's no benefit to using ARF anyway, and it could trouble that's
difficult to diagnose.
See commit 9584959123, which did the
analogous change in the gen8_generator code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're going to add a Gen8+ case shortly, which would need to duplicate
this code again. Instead, share it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we combine the Gen4-7 and Gen8+ generators, we'll need to handle
half float packing/unpacking functions somehow. The Gen8+ generator
code today just emulates the behavior of the Gen7 F32TO16/F16TO32
instructions, including the align16 mode bugs.
Rather than messing with fs_generator/vec4_generator, I decided to just
emulate the instructions at the brw_eu_emit.c layer.
v2: Change gen >= 7 asserts to gen == 7 (suggested by Chris Forbes).
Fix regressions on Haswell in VS tests due to type assertions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell requires the number of vertices written by the geometry shader
to be specified in a separate register, as part of the terminating
message's payload.
This also means GS_OPCODE_THREAD_END needs to increment mlen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
g0.5 has nothing of value to contribute to m0.5. In both the VS and GS
payload, g0.5 contains the scratch space pointer - which is definitely
not of any use. The GS payload also contains FFTID, but the URB write
message header doesn't want FFTID.
The only reason I used OR was because Eric originally requested it.
On Broadwell, I used MOV, and that's worked out fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Haswell, we implement "discard" via predicated SEND messages, using
f0.1 instead of f0.0. To accomplish this, we set inst->flag_subreg to 1
on the FS_OPCODE_FB_WRITE.
Most instructions using fs_inst::flag_subreg expand to a single assembly
instruction. However, FS_OPCODE_FB_WRITE can generate several MOVs for
setting up header information. We don't want to set flag_subreg on
those, so override the default state back to 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously the Meta implementation of glGetTexImage would fall back to
_mesa_get_teximage if the texturing is not using an unsigned normalised
format. However in order to support the half-float formats of BPTC textures we
can make it render to a floating-point renderbuffer instead. This patch makes
decompression_state have two FBOs, one for the GL_RGBA format and one for
GL_RGBA32F. If a floating-point texture is encountered it will try setting up
a floating-point FBO. It will now also check the status of the FBO and fall
back to _mesa_get_teximage if the FBO is not complete.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Enables the BPTC extension on Gen>=7 and adds the necessary format mappings to
get the right surface type value.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Once we add BPTC texture support we will need to generate mipmaps for
compressed floating point textures too. Most of the code seems to already be
there but it just needs a few extra lines to get it to use GL_FLOAT instead of
GL_UNSIGNED_BYTE as the type for the temporary buffers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds compressors for all four of the BPTC compressed-texture formats. The
compressor is written from scratch and takes a very simple approach. It always
uses a single mode of the BPTC format (4 for unorm and 3 for half-floats) and
picks the two endpoints by dividing the texels into those which have more or
less than the average luminance of the block and then calculating an average
color of the texels within each division.
It's probably not really sensible to try to use BPTC compression at runtime
because for example with the Nvidia offline compression tool it can take in
the order of an hour to compress a full-screen image. With that in mind I
don't think it's worth having a proper compressor in Mesa and this approach
gives reasonable results for a usage that is basically a corner case.
v2: Always use the custom compressor, even for the unorm formats. Fix the
quantization step for the half-float format compressor. Fixed a typo which
was breaking the right-hand edge of half-float textures with a width that
isn't a multiple of four.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds functions to fetch from any of the four BPTC-compressed formats.
v2: Set the alpha component to 1.0 when fetching from the half-float formats
instead of leaving it uninitialised. Don't linearize the alpha component
when fetching from sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the following four Mesa image format enums which correspond to the
four BPTC compressed texture formats:
MESA_FORMAT_BPTC_RGBA_UNORM
MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM
MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT
MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT
It also updates the format information functions to handle these and the
corresponding GL enums.
v2: Also modify _mesa_get_format_color_encoding, _mesa_get_srgb_format_linear
and _mesa_get_uncompressed_format
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend
on the mode but as it only has to be an approximation this sets it to 8 for
the two UNORM formats and 16 for the two half-float formats. These represent
the minimum number of bits of variation that can be generated by the
interpolation of the two formats.
This doesn't quite match what we do for S3TC which only returns 4 even though
it can similarly generate 8 bits from the interpolation. However it does match
what we return for ETC2. For reference, NVidia seems to return 8 bits for the
UNORM formats and 32 bits for the half-float formats.
v2: Change the number of bits to 8/8/8/8 for the UNORM formats and 16/16/16
for the half-float formats.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If the name of a compressed texture format has ‘FLOAT’ in it it will now set
the data type of the format to GL_FLOAT. This will be needed for the BPTC
half-float formats.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The signed and unsigned half-float BPTC-compressed formats were being reported
as having a base format of GL_RGBA but they don't store an alpha channel so it
should be GL_RGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds a boolean in the gl_extensions struct for
GL_ARB_texture_compression_bptc as well as an entry in extension_table.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The initial firmware for hawaii does not support type3 nop packet.
Detect the new hawaii firmware with query RADEON_INFO_ACCEL_WORKING2.
If the returned value is 3, then the new firmware is used.
This patch uses type2 for the old firmware and type3 for the new firmware.
It fixes the cases when the old firmware is used and the user wants to
manually enable acceleration.
The two possible scenarios are:
- the kernel has no support for the new firmware.
- the kernel has support for the new firmware but only the old firmware
is available.
Additionaly this patch disables GPU acceleration on hawaii if the kernel
returns a value < 2. In this case the kernel hasn't the required fixes
for proper acceleration.
v2:
- Fix indentation
- Use private struct radeon_drm_winsys instead of public struct radeon_info
- Rename r600_accel_working2 to accel_working2
v3:
- Use type2 nop packet for returned value < 3
v4:
- Fail to initialize winsys for returned value < 2
Cc: mesa-stable@lists.freedesktop.org
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds a limit to the maximum surface size which is
based on the maximum size of a single mob. If this value is not
available, the maximum surface size is by default set to 128 MB.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace the plain sampler index with a register reference to a sampler.
We also need to keep track of the sampler array size when there is a
relative reference so that we can mark the whole array used.
To facilitate implementation, we add a separate ADDR register that
exclusively handles the sampler relative address. Other approaches would
be more invasive.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If the array index is not a constant expression, the existing support
will assume a zero offset (giving us the sampler index of the base of
the array).
For dynamically uniform indexing of sampler arrays, we need both that
and the indexing expression.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
V2: Expand comment to explain what dynamically uniform expressions are
about.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This prevents some simulator assertion failures, but it does mean (since
I've dropped the "* 16" padding) that on real hardware you need a kernel
that does overflow memory management (currently, "drm/vc4: Add support for
binner overflow memory allocation." in my kernel tree).
In addition to reducing sim-specific code, it also avoids our local handle
allocation conflicting with the host GEM's handle numbering, which was
causing vc4_gem_hindex() to not distinguish between winsys BOs and the
same-numbered non-winsys bo.
We don't care if things like vertex data get smashed by render target
data, but we do need to make sure that shader code doesn't get rendered
to.
v2: Fix overflowing read of gl_relocs[] that incorrect flagged of some
VBOs as shader code.
The comment conflicted with the support in the code, so I moved the TMU
write validation to where the comment was, and dropped some dead arguments
from the functions while changing their signatures.
This doesn't load/store the Z contents across submits yet. It also
disables early Z, since it's going to require tracking of Z functions
across multiple state updates to track the early Z direction and whether
it can be used.
v2: Move the key setup to before the search for the key.
Otherwise, once we're not flushing at the end of every draw, we'll free
things like gallium resources, and free the backing GEM object, before
we've flushed the rendering using it to the kernel.
render_cl_size/bin_cl_size includes relocations, while the hardware buffer
doesn't. If you don't emit a HALT packet, the command parser continues
until the end register's value. We can't allow executing unvalidated
buffer contents (and it's actually harmful in the render lists Mesa is
emitting, since VC4_PACKET_STORE_MS_TILE_BUFFER_AND_EOF doesn't trigger a
halt).
This required building a shader parser that would walk the program to find
where the texturing-related uniforms are in the uniforms stream.
Note that as of this commit, a new kernel is required for rendering on
actual VC4 hardware (currently that commit is named "drm/vc4: Introduce
shader validation and better command stream validation.", but is likely to
be squashed as part of an eventual merge of the kernel driver).
This ensures that when I'm using the simulator, I get a closer match to
what behavior on real hardware will be. It lets me rapidly iterate on the
kernel validation code (which otherwise has a several-minute turnaround
time), and helps catch buffer overflow bugs in the userspace driver
faster.
Only rgba8888 works, and only a single texture unit, and it's only under
simulation because I haven't built the kernel interface yet.
v2: Rebase on helpers.
v3: Fold in the don't-break-the-arm-build fix.
This limit is fixed in Mesa core and cannot be changed.
It only affects ARB_vertex_program and ARB_fragment_program.
The minimum value for ARB_vertex_program is 1 according to the spec.
The maximum value for ARB_vertex_program is limited to 1 by Mesa core.
The value should be zero for ARB_fragment_program, because it doesn't
support ARL.
Finally, drivers shouldn't mess with these values arbitrarily.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This computes all GL versions before any context is created.
It's a requirement for GLX_MESA_query_renderer.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Setting Const.MaxSamples needed a rework, so that it doesn't call
st_choose_format, which depends on st_context.
Other than that, there is no change in functionality.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I don't know of any hardware which supports it.
With this, GL_OES_compressed_ETC1_RGB8_texture is supported if RGBA8
is supported.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
The formats are emulated by translating them into plain uncompressed
formats, because I don't know of any hardware which supports them.
This is required for GLES 3.0 and ARB_ES3_compatibility (GL 4.3).
This casues problems when converting atomics to use the GRF. Sometimes the atomic operation would get eaten by CSE when it shouldn't.
v2: Roll the has_side_effects check into is_expression
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This, together with the meta path, provides a complete implemetation of
ARB_copy_image.
v2: Add a fallback memcpy path for when the texture is too big for the
blitter
v3: Properly support copying between two places on the same texture in the
memcpy fallback
v4: Properly handle blit between the same two images in the fallback path
v5: Properly handle blit between the same two compressed images in the
fallback path
v6: Fix a typo in a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This provides an implementation of CopyImageSubData that works if both
textures are uncompressed. This implementation works by using a
combination of texture views and BlitFramebuffer. If one of the textures
is compressed, it returns false and the driver is expected to provide a
fallback.
v2: Don't leak fbo's
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v3: Change glGen/DeleteTextures to _mesa_Gen/DeleteTextures
This adds the API entrypoint, error checking logic, and a driver hook for
the ARB_copy_image extension.
v2: Fix a typo in ARB_copy_image.xml and add it to the makefile
v3: Put ARB_copy_image.xml in the right place alphebetically in the
makefile and properly prefix the commit message
v4: Fixed some line wrapping and added a check for null
v5: Check for incomplete renderbuffers
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
v6: Update dispatch_sanity for the addition of CopyImageSubData
There's no need to copy the array of DrawBuffer enums to a temp array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Fixes failed assertion when _mesa_update_draw_buffers() was called
with GL_DRAW_BUFFER == GL_FRONT_AND_BACK. The piglit gl30basic hit
this.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No need to return a value. Remove unused ctx parameter. Remove
_mesa_ prefix since it's static.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Silences MinGW warnings:
warning: unknown conversion type character ‘l’ in format [-Wformat]
warning: too many arguments for format [-Wformat-extra-args]
v2: use signed types/formats
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously the blorp blitter wouldn't be used if the source and destination
buffer had a different format other than swizzling between RGB and BGR and
adding or removing a dummy alpha channel. However there's no reason why the
blorp code path can't be used to do almost all format conversions so this
patch just removes the checks. However it does explicitly disable converting
to/from MESA_FORMAT_Z24_UNORM_X8_UINT because there is a similar check
brw_blorp_copytexsubimage.
This doesn't cause any Piglit test regressions at least on Ivybridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell measures jump distances in bytes, so we need to scale by 16.
v2: Update the function in brw_eu.h, not in brw_eu_emit.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Different generations of hardware measure jump distances in different
units. Previously, every function that needed to set a jump target open
coded this scaling, or made a hardcoded assumption (i.e. just used 2).
Most functions start with the number of instructions to jump, and scale
up to the hardware-specific value. So, I made the function match that.
Others start with a byte offset, and divide by a constant (8) to obtain
the jump distance. This is actually 16 / 2 (the jump scale for Gen5-7).
v2: Make the helper a static inline defined in brw_eu.h, instead of
an actual function in brw_eu_emit.c (as suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It has Gen6+ knowledge baked in, and indeed is only called for Gen6+,
but it wasn't immediately obvious that this was the case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Until now, it's been off implicitly: we never call the compactor
function. When we merge the generators, we'll start calling it, so we
should make it do nothing.
Matt will enable instruction compaction properly later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell is going to use the brw_eu_emit.c code soon. We want to get
the fake MRF handling and URB HWord channel mask handling.
We don't need the CMP thread switch workaround, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we no longer use ctx->DrawBuffer->_Xmin and related fields to
program the screen-space viewport extents, we don't depend on any
scissoring state. So we can drop the +_NEW_SCISSOR dependency.
On GEN8, a change in scissor state does not effect anything for the
clipper/sf hardware state. The hardware will always do the right thing
once the viewport extents are programmed. We can therefore remove the
unecessary state emission.
Ken originally spotted this.
v2: Reword the commit message. Remove spurious hunk.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal of guardband clipping is to try to avoid 3d clipping because it
is an expensive operation. When guardband clipping is disabled, all
geometry that intersects the viewport is sent to the FF 3d clipper.
Objects which are entirely enclosed within the viewport are said to be
"trivially accepted" while those entirely outside of the viewport are,
"trivially rejected".
When guardband clipping is turned on the above behavior is changed such
that if the geometry is within the guardband, and intersects the
viewport, it skips the 3d clipper. Prior to GEN8, this was problematic
if the viewport was smaller than the screen as it could allow for
rendering to occur outside of the viewport. That could be mitigated if
the programmer specified a scissor region which was less than or equal
to the viewport - but this is not required for correctness in OpenGL. In
theory you could be clever with the guardband so as not to invoke this
problem. We do not do this, and have no data that suggests we should
bother (nor the converse data).
With viewport extents in place on GEN8, it should be safe to turn on
guardband clipping for all cases
While here, add a comment to the code which confused me thoroughly.
v2: Update grammar in commit message. Reword comments based on Ken's
suggestion.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Viewport extents are a 3rd rectangle that defines which pixels get
discarded as part of the rasterization process. The actual pixels drawn
to the screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle. It permits the use of guardband
clipping in all cases (see later patch). The actual pixels drawn to the
screen are an intersection of the drawing rectangle, the viewport
extents, and the scissor rectangle.
Scissor rectangle is not super important for this discussion as it should
always help do the right thing provided the programmer uses it.
switch (viewport dimensions, drawrect dimension) {
case viewport > drawing rectangle: no effects; break;
case viewport == drawing rectangle: no effects; break;
case viewport < drawing rectangle:
Pixels (after the viewport transformation but before expensive
rastersizing and shading operations) which are outside of the
viewport are discarded.
}
I am unable to find a test case where this improves performance, but in
all my testing it doesn't hurt performance, and intuitively, it should
not ever hurt performance. It also permits us to use the guardband more
freely (see upcoming patch).
v2: Updating commit message.
v3: Commit message updates requested by Ken
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While working in this part of the code I had a great deal of trouble
understanding what it was trying to do, and matching it with the spec.
(mostly due bad wording in the PRM). To help future people, I've cleaned
up the wording and provided some ascii art.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This lets us call dump_instructions() after register allocation without
failing an assertion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Without this patch I get the following during DMA transfers:
[drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream !
radeon 0000:01:00.0: CP DMA dst buffer too small (21475829792 4096)
This is a fixup for e878e154cd.
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tahiti has 12 tile pipes, but P8 pipe config.
It looks like there is no way to get the pipe config except for reading
GB_TILE_MODE. The TILING_CONFIG ioctl doesn't return more than 8 pipes,
so we can't use that for Hawaii.
This fixes a regression caused by 9b046474c9
on Tahiti.
v2: add an assertion and print an error on failure
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The code is rewritten to take known constraints into account, while always
using 0 by default.
This should improve performance for multi-SE parts in theory.
A debug option is also added for easier debugging. (If there are hangs,
use the option. If the hangs go away, you have found the problem.)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: fix a typo, set max_se for evergreen GPUs according to the kernel driver
This is a bug which was probably uncovered recently by Jason's commits
and broke this.
The problem is _mesa_base_tex_format(GL_STENCIL_INDEX) returns -1.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
These values are supposed to be the minimum/maximum index values used to
read from the vertex buffers. This code either copies index values out of
the old IB (so, same min/max as the original draw call), or generates a
new IB (using index values between the start and the start + count of the
old array draw info, which just happens to be what min/max_index are set
to by st_draw.c).
We were incorrectly setting the max_index in the
converting-from-glDrawArrays case to the start vertex plus the number of
vertices generated in the new IB, which broke QUADS primitive conversion
on VC4 (where max_index really has to be correct, or the kernel might
reject your draw call due to buffer overflow).
Reviewed-by: Rob Clark <robclark@freedesktop.org> (from verbal description
of the patch)
Some tests start working (useprogram-flushverts, for example) due to
getitng the right vertices now. Some that used to pass start failing with
memory overflow during binning, which is weird (glsl-fs-texture2drect).
And a couple stop rendering correctly (glsl-fs-bug25902).
v2: Move the attribute format setup in the key from after search time to
before the search.
v3: Fix reading of attributes other than position (I forgot to respect
attr and stored everything in inputs 0-3, i.e. position).
We could get undefined sources in real programs from the wild, so we'll
need to turn off this debug eventually. But for now, using undefined
sources is typically me just mistyping something.
We put in a bunch of extra MOVs for program outputs, and this can clean
those up. We should do uniforms, too, though.
v2: Fix missing flagging of progress when we actually optimize. Caught by
Aaron Watry.
This cleans up a bunch of noise in the compiled coordinate shaders (since
we don't need the varying outputs), and also from writemasked instructions
with negated src operands.
This took a couple of tries, and this is the squash of those attempts.
v2: Fix register file conflicts on the args in the
destination-is-accumulator case.
v3: Rebase on helper change and qir_inst4 change.
We will want to occasionally disable this again when we do clear support.
v2: Squash with the previous commit (I accidentally committed at two
stages of writing the change)
This introduces an IR (QIR, for QPU IR) to do optimization on. It's a
scalar, SSA IR in general. It looks like optimization is pretty easy this
way, though I haven't figured out if it's going to be good for our weird
register allocation or not (or if I want to reduce to basically QPU
instructions first), and I've got some problems with it having some
multi-QPU-instruction opcodes (SEQ and CMP, for example) which I probably
want to break down.
Of course, this commit mostly doesn't work, since many other things are
still hardwired, like the VBO data.
v2: Rewrite to use a bunch of helpers (qir_OPCODE) for emitting QIR
instructions into temporary values, and make qir_inst4 take the 4 args
separately instead of an array (all later callers wanted individual
args).
Note: This is the cutoff point where I switched from developing primarily
on the Pi to developing o the simulator. As a result, from this point on
the code is untested on the Pi (the kernel code I have currently wasn't
rendering anything at this commit, though the simulator renders
successfully, suggesting kernel bugs).
This mostly just takes every draw call and turns it into a sequence of
commands that clear the FBO and draw a single shaded triangle to it,
regardless of the actual input vertices or shaders. I copied the initial
driver skeleton mostly from freedreno, and I've preserved Rob Clark's
copyright for those. I also based my initial hardcoded shaders and
command lists on Scott Mansell (phire)'s "hackdriver" project, though the
bit patterns of the shaders emitted end up being different.
v2: Rebase on gallium megadrivers changes.
v3: Rebase on PIPE_SHADER_CAP_MAX_CONSTS change.
v4: Rely on simpenrose actually being installed when building for
simulation.
v5: Add more header duplicate-include guards.
v6: Apply Emil's review (protection against vc4 sim and ilo at the same
time, and dropping the dricommon drm bits) and fix a copyright header
(thanks, Roland)
The non-llvm path made sure that both clip and pre_clip_pos point to the data
output by position, not clipvertex, if user based clipping is disabled.
However, the llvm path did not, which apparently led to failures if
gl_ClipVertex was written but user plane clipping not enabled (bug 80183).
Why I have no idea really, but just make it match the non-llvm behavior...
Reviewed-by: Brian Paul <brianp@vmware.com>
Use both macros as in some cases using AC_CHECK_FUNCS alone may fail.
Thus HAVE_DLADDR will not be defined, and as a result most of the code
in megadriver_stub.c will not be compiled. Breaking the backwards
compatibility between older libGL/xserver(s) and DRI megadrivers.
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
[Emil Velikov] Commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The tests in an empty stub, which we're currently building twice.
If anyone is interested in expanding it (adding actual tests) they
can always bring it back.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This support is preliminary due to the fact that MSAA is not
actually implemented.
However, this patch does fix the piglit test:
spec/!OpenGL 3.2/glsl-resource-not-bound 2DMS (bug #79740).
(v2 RS: don't emit 4th coord as explicit lod)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The distinction between system values and ordinary inputs is not very
obvious in gallium - further fueled by the fact that they use the same
semantic names.
Still, if there's any value which imho really is a system value, it's the
primitive id input into the gs (while earlier (tessleation) stages could read
it, it is _always_ generated by the system). For some odd reason though (which
I'd classify as a bug but seems too complicated to fix) the glsl compiler in
mesa treats this as an ordinary varying, and everything else after that
(including the state tracker and other drivers) just go along with that.
But input fetching in gs for llvm based draw was definitely limited to the
ordinary (2-dimensional) inputs so only worked with other state trackers,
the code was also additionally relying on tgsi_scan_shader filling
uses_primid correctly which did not happen neither (would set it only for
all stages if it was a system value, but only set it for the fragment shader
if it was an input value).
This fixes piglit glsl-1.50-geometry-primitive-id-restart and primitive-id-in
in llvmpipe.
Reviewed-by: Brian Paul <brianp@vmware.com>
These values are always uints, casting them to floats does no good.
Fixes piglit glsl-1.50-geometry-primitive-id-restart tests for softpipe.
Reviewed-by: Brian Paul <brianp@vmware.com>
OpenCL 1.2 CL_MAP_WRITE_INVALIDATE_REGION sounds a lot like
PIPE_TRANSFER_DISCARD_RANGE:
From OpenCL 1.2 spec:
The contents of the region being mapped are to be discarded.
From p_defines.h:
Discards the memory within the mapped region.
v2: Move the code for validating flags to the front-end as
suggested by Francisco Jerez
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The PRMs no longer have a single table for format capabilities. Multiple
tables take up less space, and are easier to maintain.
Encode typed write information while at it.
We have a CPU-side implementation of conditional rendering; it really
should be done on the GPU. It's not necessarily that hard, but nobody
has gotten to fixing it yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we explicitly set the execution size to BRW_EXECUTE_8 and
disabled compression for loop instructions. I can't imagine how this
could be correct in SIMD16 mode.
Looking at the history, it appears that this code has used BRW_EXECUTE_8
since 2007, when we had a SIMD8 backend that supported control flow and
a separate SIMD16 backend that didn't. Presumably, when we added SIMD16
support for shaders with control flow, we simply neglected to update it.
Note that Gen4-5 don't support SIMD16 on shaders with control flow.
This might be a candidate for stable, but would need to be rewritten
completely due to the brw_inst API changes in master.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We shouldn't need to set them, then set them differently.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
format_srgb.c is generated by format_srgb.py python script, having
format_srgb.c in git ignore list will silence git complaints about
untracked file.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also delete the comment before that function. Everything in that
comment was either stale, wrong, or captured elsewhere.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies all the callers, and it enables the removal of one of
the function parameters.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With two tests both numbered 118, there was a confusing off-by-two difference
between the last test number and the total number of tests (as reported by
glcpp-test).
With this rename, there's only an off-by-one difference left, (which is easy
to understand given the zero-based test numbering).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Here is some additional stress testing of nested macros where the expansion
of macros involves commas, (and whether those commas are interpreted as
argument separators or not in subsequent function-like macro calls).
Credit to the GCC documentation that directed my attention toward this issue:
https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Argument-Prescan.html
Fixing the bug required only removing code from glcpp. When first testing the
details of expansions involving commas, I had come to the mistaken conclusion
that an expanded comma should never be treated as an argument separator, (so
had introduced the rather ugly COMMA_FINAL token to represent this).
In fact, an expanded comma should be treated as a separator, (as tested here),
and this treatment can be avoided by judicious use of parentheses (as also
tested here).
With this simple removal of the COMMA_FINAL token, the behavior of glcpp
matches that of gcc's preprocessor for all of these hairy cases.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Beyond just listing this in the TESTS variable in Makefile.am, only minor
changes were needed to make this work. The primary issue is that the build
system runs the test script from a different directory than the script
itself. So we have to use the $srcdir variable to find the test input files.
Using $srcdir in this way also ensures that this test works when using an
out-of-tree build.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The (optional) test-specific command-line arguments to be passed to glcpp are
embedded within the source files of some tests, and glcpp-test uses grep to
extract them.
Of course, grep is line-based and looks for the native line-separator to
determine line boundaries. So, for files using non-native line separators,
grep was getting quite confused and passing bogus arguments to glcpp.
Fix this by canonical-izing the line separators in the source file prior to
using grep.
With this commit, the glcpp-test-cr-lf tests pass entirely:
\r: 143/143 tests pass
\r\n: 143/143 tests pass
\n\r: 143/143 tests pass
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes the newline separator is a single character, and sometimes it is two
characters. Before we can fold away and line-continuation backslashes, we
identify the flavor of line separator that is in use.
With this identified, we then correctly search for backslashes followed
immediately by the first character of the line separator.
Also, when re-inserting newlines to replace collapsed newlines, we carefully
insert newlines of the same flavor.
With this commit, almost all remaining test are fixed as tested by
glcpp-test-cr-lf:
\r: 142/143 tests pass
\r\n: 142/143 tests pass
\n\r: 143/143 tests pass
(The only remaining failures have nothing to do with the actual pre-processor
code, but are due to a bug in the way the test suite uses grep to try to
extract test-specific command-line options from the source files.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some tests were failing because the message printed by #error was including a
'\r' character from the source file in its output.
This is easily avoided by fixing the regular expression for #error to never
include any of the possible newline characters, (neither '\r' nor '\n').
With this commit 2 tests are fixed for each of the '\r' and '\r\n' cases.
Current results after the commit are:
\r: 137/143 tests pass
\r\n 142/143 tests pass
\n\r: 139/143 tests pass
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL specification says that either carriage-return, line-feed, or both
together can be used to terminate lines. Further, it says that when used
together, the pair of terminators shall be interpreted as a single line.
This final requirement has not been respected by glcpp up until now, (it has
been emitting two newlines for every CR+LF pair).
Here, we fix the lexer by using a regular expression for NEWLINE that eats
up both "\r\n" (or even "\n\r") if possible before also considering a single
'\n' or a single '\r' as a line terminator.
Before this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 4/143 tests pass
\n\r: 4/143 tests pass
After this commit, the test results are as follows:
\r: 135/143 tests pass
\r\n: 140/143 tests pass
\n\r: 139/143 tests pass
So, obviously, a dramatic improvement.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL specification has a very broad definition of what is a
newline. Namely, it can be the carriage-return character, '\r', the newline
character, '\n', or any combination of the two, (though in combination, the
two are treated as a single newline).
Here, we add a new test-runner, glcpp-test-cr-lf, that, for each possible
line-termination combination, runs through the existing test suite with all
source files modified to use those line-termination characters. Instead of
using the .expected files for this, this script assumes that the regular test
suite has been run already and expects the output to match the .out
files. This avoids getting 4 test failures for any one bug, and instead will
hopefully only report bugs actually related to the line-termination
characters.
The new testing is not yet integrated into "make check". For that, some
munging of the testdir option will be necessary, (to support "make check" with
out-of-tree builds). For now, the scripts can just be run directly by hand.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to this commit, the following snippet would trigger an error in glcpp:
#define FOO defined BAR
#if FOO
#endif
The problem was that support for the "defined" operator was implemented within
the grammar, (where the parser was parsing the tokens of the condition
itself). But what is required is to interpret the "defined" operator that
results after macro expansion is performed.
I could not find any fix for this case by modifying the grammar alone. The
difficulty is that outside of the grammar we already have a recursive function
that performs macro expansion (_glcpp_parser_expand_token_list) and that
function itself must be augmented to be made aware of the semantics of the
"defined" operator.
The reason we can't simply handle "defined" outside of the recursive expansion
function is that not only must we scan for any "defined" operators in the
original condition (before any macro expansion occurs); but at each level of
the recursive expansion, we must again scan the list of tokens resulting from
expansion and handle "defined" before entering the next level of recursion to
further expand macros.
And of course, all of this is context dependent. The evaluation of "defined"
operators must only happen when we are handling preprocessor conditionals,
(#if and #elif) and not when performing any other expansion, (such as in the
main body).
To implement this, we add a new "mode" parameter to all of the expansion
functions to specify whether resulting DEFINED tokens should be evaluated or
ignored.
One side benefit of this change is that an ugly wart in the grammar is
removed. We previously had "conditional_token" and "conditional_tokens"
productions that were basically copies of "pp_token" and "pp_tokens" but with
added productions for the various forms of DEFINED operators. With the new
code here, those ugly copy-and-paste productions are eliminated from the
grammar.
A new "make check" test is added to stress-test the code here.
This commit fixes the following Khronos GLES3 CTS tests:
conditional_inclusion.basic_2_vertex
conditional_inclusion.basic_2_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we were passing these through, just like any other pragma. But the
downstream compiler was tripping up on them. It seems easier to swallow these
in the preprocessor and not pass them on at all rather than fixing the
downstream compiler.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.pragmas.pragma_vertex
preprocessor.pragmas.pragma_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the #pragma directive was swallowing an entire line, (including
the final newline). At that time it was appropriate for it to increment the
line count.
More recently, our handling of #pragma changed to not include the newline. But
the code to increment yylineno stuck around. This was causing __LINE__ to be
increased by one more than desired for every #pragma.
Remove the bogus, extra increment, and add a test for this case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This new "make check" test stresses out the support from the last two commits,
(to esnure that '#' is correctly interpreted as the null directives,
regardless of any whitespace or comments on the same line).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is the fix for the following line:
# // comment to ignore here
According to the translation-phase rules, the comment should be removed before
the preprocessor looks to interpret the null directive.
So in our implementation we must explicitly look for single-line comments in
the <HASH> start condition as well.
This commit fixes the following Khronos GLES3 CTS tests:
null_directive_vertex
null_directive_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This simply tests the previous commit, (that #define followed by a comment
will still generate the expected "#define without macro name" error message).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were already correctly supporting single-line comments in case like:
#define FOO bar // comment here...
The new support added here is simply for the none-too-useful:
#define // comment instead of macro name
With this commit, this line will now give the expected "#define without
macro name" error message instead of the lexer just going off into the
weeds.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This ensures that the previous commit indeed generates the expected error
message when a "#define" directive is not followed by anything except for a
newline.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, glcpp would emit an error like this if <EOF> happened to occur
immediately after the "#define", but in general would just get confused,
(leading to un-helpful error messages).
To fix things to generate a clean error message, we do a few things:
1. Don't require horizontal whitespace immediately after #define
2. Add a production for the error case, (DEFINE_TOKEN followed
immediately by a NEWLINE token).
3. Make the lexer reset to the <INITIAL> state after every NEWLINE.
This 3rd point prevents the lexer from getting so confused and generating
further spurious errors in the file because it was stuck in the <DEFINE> start
condition.
We also drop the similar error message from the <EOF> rule since the
newly-added rule will have already printed the error message.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Listing the GLSL version as an individual component of a GL version,
separate from the extensions isn't really right. The GLSL changes are
(almost?) entirely comprised of changes listed in the extensions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I think OpenVMS was the only platform that Mesa ran on that used a
non-IEEE representation for floats. We removed OpenVMS support a while
back, and this should alleviate the need to continue updating the
this-platform-uses-IEEE list.
The one bit of this patch that needs review is the IS_INF_OR_NAN,
because I'm not sure if MSVC supports isfinite.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82268
Reviewed-by: Brian Paul <brianp@vmware.com>
Future patches will rearrange the values in gl_system_value, and I want
to catch errors. Designated initializers would make all of this
unnecessary.
v2: Don't use STATIC_ASSERT. Not only does it not work, but GCC doesn't
tell you that it's not going to work. Thanks for nothing!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Due to the destination register width of 1 or 2, these instructions get
ExecSize 1 or 2. But dir and offset (used as src0) are both registers
of width 4, violating the execsize >= width assertion.
I honestly don't think this could have ever worked.
Fixes Piglit's polygon-offset and polygon-mode-offset tests on Gen4-5.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70441
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If the vertex shader has no position but the gs has, the clipvertex output
was -1 (because it's the same as vs position in this case if there's no
explicit clipvertex output). This caused crashes (or assertion failures) in
clipping since in the end position (which came from gs) was different from
cv (-1) and we then tried to use the bogus cv input.
Rather than just test for -1 cv value in clipping, make it explicitly return
the position output of the gs instead which seems cleaner (since we really
don't want to use the clipvertex value from the vs (it could be a valid value
in the (unsupported) case of vs writing clipvertex but still using a gs).
This fixes piglit shader_runner clip-distance-out-values.shader_test.
Reviewed-by: Zack Rusin <zackr@vmware.com>
The clip stage may crash if there's no position output, for this reason
code was added to avoid running the pipeline stages in this case
(c7c7186045). However, this failed to actually
work when there was a geometry shader, since unlike the vertex shader it did
not initialize the position output to -1, hence the code trying to detect
this didn't trigger. So simply initialize the position output to -1 just like
the vs does.
This fixes piglit glsl-1.50-transform-feedback-type-and-size (segfault->pass).
clip-distance-out-values.shader_test goes from segfault to assertion failure,
suggesting more fixes are needed, no other piglit changes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This patch fixes this build error on Mac OS X.
./xmlconfig.h:61:5: error: unknown type name 'uint'; did you mean 'int'?
uint nRanges; /**< \brief Number of ranges */
^~~~
int
./xmlconfig.h:79:5: error: unknown type name 'uint'; did you mean 'int'?
uint tableSize;
^~~~
int
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Again, we delete a lot of functions that aren't really doing anything
interesting anymore.
v2: Comment the texstore_rgba_integer function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commit also removes a bunch of functions which aren't doing anything
more interesting than the general path does.
v2: Better comment the texstore_via_float function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should be both faster and more accurate than our general slow-path of
converting everything to float.
v2: Add a comment to top of the texstore_swizzle function
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commit splits the texture storage into three functions:
texstore_depth_stencil, texstore_compressed, and texstore_rgba. Right now
this split seems artificial since we just have one function pointer per
format and there is no difference between these three categories. However,
this split makes it much easier to write a more general function upload
path for one of these categories than the current function pointers.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This commits adds the _mesa_format_to_array function that determines if the
given format can be represented as an array format and computes the array
format parameters. This is a direct helper function for using
_mesa_swizzle_and_convert
v2: Better documentation and commit message
v3: Fixed a potential segfault from an invalid endianness swizzle
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Most format conversion operations required by GL can be performed by
converting one channel at a time, shuffling the channels around, and
optionally filling missing channels with zeros and ones. This adds a
function to do just that in a general, yet efficient, way.
v2:
* Add better comments including full docs for functions
* Don't use __typeof__
* Use inline helpers instead of writing out conversions by hand,
* Force full loop unrolling for better performance
v3: Add another set of parens around the MAX_INT macro
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa hasn't supported color-indexed textures for some time. This is 0 for
all texture formats, so we don't need to store it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of a having all of the format metadata in a gigantic hard-to-edit
array of type struct format_info, we now have a human-readable CSV file.
The CSV file also contains more format information than the format_info
struct contained so we can potentially make format_info more detailed later.
The python to generate the format information was added the previous
commit. This commit turns it on in both automake and scons builds.
v2: Split into two commits and stuff to generate format_info.c from scons
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds a python script called format_info.py that is used to generate a
single format_info.c file that contains the filled-out format_info array.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The basic concept for the format parser was taken from the format CSV
parser in gallium/auxilliary/util. However, this one has been altered in a
number of ways:
* Removed big endian vs. little endian stuff (mesa doesn't need it)
* Better documentation: Almost every method has a full docstring
* An actual Swizzle class with methods for composition and inverses
* Over-all cleaner (in my opinion) implementation and class interactions
* A few bug fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Specify the quad's Z position in clip coordinate space, not
normalized Z space. Use viewport scale, translation = 0.5, 0.5.
Before, we were specifying the quad's Z position in [0,1] and using
viewport scale=1.0, translate=0.0. That works fine, unless your
driver needs to work in clip coordinate space and needs to
reconstruct viewport near/far values from the scale/translation
factors. The VMware svga driver falls into that category.
When we did that reconstruction we wound up with near=-1 and far=1
which are outside the limits of [0,1]. In some cases, this caused
the quad to be drawn at the wrong depth. In other cases it was
clipped away.
Fixes some scissored depth clears with VMware driver. This should
have no effect on other drivers. We're already using these values
for the glBitmap and glDraw/CopyPixels code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Compute the bitmask of supported array types once instead of every
time we call a GL vertex array function.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
According to the GL spec the only fragment operations that should affect
glBlitFramebuffer are “the pixel ownership test, the scissor test, and sRGB
conversion”. That implies that dithering should not be performed so we need to
disable it when implementing the blit with a render.
Before commit 05b52efbc9 the dithering state would be left as whatever the
application picks (the default being GL_TRUE) and after that commit it was
explicitly enabled. Neither of these were correct.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81828
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Add documentation for TEX2/TXL2/TXB2 tgsi opcodes. Also, the texture opcode
documentation wasn't very accurate so fix this up a bit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In particular need to handle TEX2/TXB2/TXL2 opcodes.
cube map shadow with bias already used TXB2 which didn't work before
at all, despite that there's by default no piglit change (but using
no_quad_lod and no_rho_opt indeed passes some more tex-miplevel-selection
tests).
The actual sampling code still won't handle cube map arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
This just covers the resource side of things, not the actual sampling.
Here things are trivial as cube map arrays are identical to 2d arrays in
all respects.
Reviewed-by: Brian Paul <brianp@vmware.com>
We would generate EGL_BAD_CONFIG because _eglGetContextAPIBit
returns zero for the combination of EGL_OPENGL_ES_API and a major
version > 3. By just returning zero, the caller can't tell the
difference between a bad version (which should generate
EGL_BAD_MATCH) and a bad API (which should generate
EGL_BAD_CONFIG). This patch causes us to filter out major
versions > 3 at a point where we can generate the correct error.
Fixes gles3 Khronos CTS test:
egl_create_context.egl_create_context
V2: Fix commit message as suggested by Ian.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Changes in the patch will cause datatype to be computed
correctly for 8 and 16 bit integer formats. For example:
GL_RG8I, GL_RG16I etc.
Fixes many failures in gles3 Khronos CTS test:
copy_tex_image_conversions_required
copy_tex_image_conversions_forbidden
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We currently get red bits from ctx->DrawBuffer->Visual.redBits
by making a false assumption that the texture we're writing to
(in glCopyTexImage2D()) is used as a DrawBuffer.
Fixes many failures in gles3 Khronos CTS test:
copy_tex_image_conversions_required
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
GL_TEXTURE_CUBE_MAP is an allowed texture target in glTexStorage2D()
and is allowed to be used (like GL_TEXTURE_2D) with compressed internal
formats.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we had to keep unreachable global symbols in the symbol table
because the symbol table is used during linking. Having the symbol
table retain pointers to freed memory... what could possibly go wrong?
At the same time, this meant that we kept live references to tons of
memory that was no longer needed.
New strategy: destroy the old symbol table, and make a new one from the
reachable symbols.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 59 40,642,425,451 76,337,968 69,720,886 6,617,082 0
After (32-bit): 46 40,661,487,174 75,116,800 68,854,065 6,262,735 0
Before (64-bit): 79 37,179,441,771 106,986,512 98,112,095 8,874,417 0
After (64-bit): 64 37,200,329,700 104,872,672 96,514,546 8,358,126 0
A real savings of 846KiB on 32-bit and 1.5MiB on 64-bit.
v2: (by Kenneth Graunke) Just add the ir_function from the IR stream,
rather than looking it up in the symbol table; they're now
identical.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Piglit's spec/glsl-1.10/linker/override-builtin-{const,uniform}-05 tests
do the following:
1. Call abs(float) - a built-in function.
2. Create a user-defined replacement for abs(float).
3. Call abs(float) again - now the user function.
At step 1, we created an ir_function which included the built-in
signature, added it to the symbol table, and emitted it into the IR
stream.
Then, when processing the function definition at step 2, we'd see that
there was already an ir_function. But, since there were no user-defined
functions, we skipped over a bunch of code, and ended up creating a
second one. This new ir_function shadowed the original in the symbol
table, but both ended up in the IR stream.
This results in an awkward situation where searching for an ir_function
via the symbol table, a forward linked list walk, and a reverse linked
list walk may return different ir_functions. This seems undesirable.
This patch instead re-uses the existing ir_function, putting both
built-in and user-defined signatures in the same one. The previous
patch's additional filtering ensures everything continues working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Historically, we've implemented the rules for overriding built-in
functions by creating multiple ir_functions and relying on the symbol
table to hide the one containing built-in functions. That works, but
has a few drawbacks, so the next patch will change it.
Instead, we'll have a single ir_function for a particular name, which
will contain both built-in and user-defined signatures. Passing an
extra parameter to matching_signature makes it easy to ignore built-ins
when they're supposed to be hidden.
I didn't add the parameter to exact_matching_signature since it wasn't
necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On Haswell, this cuts 1-3 instructions from 183 vertex shaders in
"Shadowrun Returns", "Shatter", and "Trine 2." It adds 2 instructions
to a single fragment shader in "Closure."
total instructions in shared programs: 278803 -> 278546 (-0.09%)
instructions in affected programs: 41930 -> 41673 (-0.61%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This code was attemping to align the base of the structure to the required
alignment of the structure. However, it had two problems:
1. It was aligning the target structure member, not the base of the
structure.
2. It was calculating the alignment based on the members previous to the
target member instead of all the members of the structure.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs.6
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.2
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.6
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.5
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.19
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.0
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.2
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.6
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.12
v2: Fix rebase failure noticed by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Use the data that is stored in the ir_variable and the glsl_type to
determine whether or not a UBO member is row-major.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.shared.row_major_mat4x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.packed.row_major_mat4x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2x3
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat2x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat3x4
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4x2
ES3-CTS.shaders.uniform_block.instance_array_basic_type.std140.row_major_mat4x3
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.2
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.5
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.9
Causes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.basic_types.8
ES3-CTS.shaders.uniform_block.random.basic_arrays.3
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.2
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.13
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.18
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.4
These failures will be fixed shortly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.3
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.13
Causes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.9
This failure will be fixed shortly.
v2: Use without_array() instead of older predicates.
v3: s/GLSL_MATRIX_LAYOUT_DEFAULT/GLSL_MATRIX_LAYOUT_INHERITED/g
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
v2: Rename GLSL_MATRIX_LAYOUT_DEFAULT to GLSL_MATRIX_LAYOUT_INHERITED.
Add comments in glsl_types.h explaining the layouts. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This causes the thing following the structure to be vec4-aligned.
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.random.nested_structs.2
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.5
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I also considered renaming visit_field(const glsl_struct_field *) to
entry_record and adding an exit_record method. This would be more
similar to the hierarchical visitor.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 32f32292 (glsl: Allow elimination of uniform block members)
enabled elimination of unused uniform block members to fix a gles3
conformance test failure. This went too far the other way.
Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec says:
"All members of a named uniform block declared with a shared or
std140 layout qualifier are considered active, even if they are not
referenced in any shader in the program. The uniform block itself is
also considered active, even if no member of the block is
referenced."
Fixes gles3conform failures in:
ES3-CTS.shaders.uniform_block.single_nested_struct.per_block_buffer_shared
ES3-CTS.shaders.uniform_block.single_nested_struct.per_block_buffer_std140
ES3-CTS.shaders.uniform_block.single_nested_struct_array.per_block_buffer_shared
ES3-CTS.shaders.uniform_block.single_nested_struct_array.per_block_buffer_std140
ES3-CTS.shaders.uniform_block.random.scalar_types.2
ES3-CTS.shaders.uniform_block.random.scalar_types.9
ES3-CTS.shaders.uniform_block.random.vector_types.1
ES3-CTS.shaders.uniform_block.random.vector_types.3
ES3-CTS.shaders.uniform_block.random.vector_types.7
ES3-CTS.shaders.uniform_block.random.vector_types.9
ES3-CTS.shaders.uniform_block.random.basic_types.5
ES3-CTS.shaders.uniform_block.random.basic_types.6
ES3-CTS.shaders.uniform_block.random.basic_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_arrays.2
ES3-CTS.shaders.uniform_block.random.basic_arrays.5
ES3-CTS.shaders.uniform_block.random.basic_arrays.8
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.0
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.4
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.5
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.6
ES3-CTS.shaders.uniform_block.random.basic_instance_arrays.9
ES3-CTS.shaders.uniform_block.random.nested_structs.0
ES3-CTS.shaders.uniform_block.random.nested_structs.1
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays.4
ES3-CTS.shaders.uniform_block.random.nested_structs_instance_arrays.8
ES3-CTS.shaders.uniform_block.random.nested_structs_arrays_instance_arrays.7
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.3
ES3-CTS.shaders.uniform_block.random.all_per_block_buffers.6
ES3-CTS.shaders.uniform_block.random.all_shared_buffer.18
v2: Whitespace and other minor fixes suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just a few lines earlier we may have wrapped the index expression with
ir_unop_i2u expression. Whenever that happens, as_constant will return
NULL, and that almost always happens.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since the ralloc test in util/tests needs gtest, we need to make sure that
the gtest subdir is loaded first. This fixes bug #82148.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With this patch, the SVGA_3D_CMD_BIND_GB_SHADER functionality will reserve
two relocations, one for the shader ID and the second for the MOB ID.
Verified with the WDDM winsys path that the number of relocations and patch
locations required is two.
Fixes Bug 1277406
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This gathers macros that have been included across components into util so
that the include chain can be more vertical. In particular, this makes
util stand on its own without any dependence whatsoever on the rest of
mesa.
Signed-off-by: "Jason Ekstrand" <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This hash table is used in core Mesa, the GLSL compiler, and the i965
driver, which makes it a good candidate for the new src/util module.
It's much faster than program/hash_table.[ch] (see commit 6991c2922f
for data), and José's u_hash_table.c has a comment saying Gallium should
probably consider switching to a linear probing hash table at some point.
So this seems like the best candidate for a shared data structure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand): Pick up another hash_table use and patch up scons
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For a long time, we've wanted a place to put utility code which isn't
directly tied to Mesa or Gallium internals. This patch creates a new
src/util directory for exactly that purpose, and builds the contents as
libmesautil.la.
ralloc seemed like a good first candidate. These days, it's directly
used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl
didn't make much sense.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand): More realloc uses and some scons fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With earlier commit we've conditionally enabled/added the kms_dri target
for automake builds. Unfortunately the we forgot to add the appropriate
define in the scons build, resulting in a broken library due to the
undefined symbol 'kms_swrast_create_screen'.
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Roland Scheidegger <sroland@vmware.com>
If building hardware drivers only, then kms_swrast_create_screen
won't be defined in inline_drm_helper.h and hardware drivers will
fail to dlopen as a result.
Copy the #if guards from inline_drm_helper.h to dri_kms_init_screen
to make the definition/use of the function match.
Fixes radeonsi_dri.so dlopen with the following configure:
./configure --with-dri-drivers= --with-dri-driverdir=/usr/local/lib/dri/ \
--enable-gbm --enable-gallium-gbm --enable-debug --enable-opencl \
--enable-opencl-icd --with-gallium-drivers=radeonsi \
--with-egl-platforms=drm --enable-glx-tls --enable-texture-float \
--enable-omx
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Native integers imply a somewhat different handling of booleans. Instead
of being 1.0/0.0 floats, they are 0 (true) / -1 (false) integers. As such
the original optimization no longer applies.
Reported-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
We don't support this type of X acceleration and we never did.
Other drivers might want to do the same thing.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In commit 16060c5adc, Eric changed the
code to not relayout just for baselevel changes - only if the range of
miplevels actually increases. So this comment is now wrong.
Notably, the i915 version of the code actually does what the comment
says.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We've moved to using bitshifts (like we did for surface state); nothing
uses the structures anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
These are the last users of struct gen7_sampler_state.
v2: Use a local sampler_state_size variable, to help distinguish the
various 16s (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is the last user of the structure.
v2: Use a local variable with a sensible name so people know what 16 is.
(Suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This simplifies the code, removes use of the old structures, and also
allows us to combine the Gen6 and Gen7+ code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Although the Gen4-6 and Gen7+ variants used different structure types,
they didn't use any of the fields - only the size, which is identical.
So both decoders did exactly the same thing.
Someday we should implement useful decoders for SAMPLER_STATE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Now that gen7_sampler_state.c is gone, everything is once again in a
single file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The code in brw_sampler_state.c now handles all generations; we don't
need the extra Gen7+ only code anymore.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This was the only actual difference between Gen4-6 and Gen7+ in terms of
the values we program. The rest was just mechanical structure
rearrangement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Instead of stuffing bits directly into the brw_sampler_state structure,
we now store them in local variables, then use brw_emit_sampler_state()
to assemble the packet. This separates the decision about what values
to use from the actual packet emission, which makes the code more
reusable across generations.
v2: Put const on a bunch of local variables and move declarations,
as suggested by Topi Pohjolainen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This simply assembles all the SAMPLER_STATE fields into their proper bit
locations. Making it work on all generations was easy enough; some of
the fields are even in the same place.
Not used by anything yet, but will be soon. I made it non-static so
BLORP can use it too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It doesn't edit the value, and this lets us use const in more places.
Needed to implement Topi's review comments for the next patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We'll use these to replace the existing structures.
I've adopted the convention that "BRW" applies to all hardware, and
"GENX" applies starting with generation X, but might be replaced by some
later generation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This makes it easy to tell that they're grouped together, and also
improves gdb printing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
brw_upload_sampler_state_table now handles all generations, so we don't
need the vtable mechanism either.
There's still a lot of code duplication; the next patches will address
that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This copies a few changes from gen7_upload_sampler_state_table; the next
patch will delete that function.
Gen7+ has per-stage sampler state pointer update packets, so we emit
them as soon as we emit a new table for a stage. On Gen6 and earlier,
we have a single packet, so we delay until we've changed everything
that's going to be changed.
v2: Split 3DSTATE_SAMPLER_STATE_POINTERS_XS packet emission into a
helper function (suggested by Topi Pohjolainen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The Gen4-6 and Gen7+ code is virtually identical, but both use different
structure types. Switching to use a uint32_t pointer and operate on the
number of DWords will make it possible to share code.
It turns out that SURFACE_STATE is the same number of DWords on every
platform currently; it will be easy to handle a change there, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Other than this, brw_update_sampler_state only deals with a single
SAMPLER_STATE structure, and doesn't need to know which position it is
in the table. The caller takes care of dealing with multiple surface
states.
Pushing this up a level allows us to drop the ss_index parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
sdc_offset is produced and consumed in the same function, so there's no
need to store it in the context, nor pass pointers to it through various
call chains.
Saves 128 bytes per brw_stage_state structure, and makes the code
clearer as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It's just an array of four floats, and we have an array of four floats,
so this is literally just a memcpy...but with custom structs and strange
macros to give the appearance of doing something more.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
When the driver was originally written, it only supported texturing in
the pixel shader backend; vertex and geometry shader texturing came much
later. Originally, the pixel shader was referred to as "WM" (the
Windowizer/Masker unit). So, this code happened to only be relevant for
the WM stage, at the time.
However, sampler state really applies to all stages, so putting "wm" in
the filename doesn't make sense. I dropped it in gen7_sampler_state.c;
at this point the asymmetry just trips people up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The "Min/Mag State Not Equal" bit is supposed to be set when the min/mag
filters or address rounding modes differ. BLORP uses identical min/mag
settings, so the bit should be unset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Define the macro GL_OES_standard_derivatives as 1 if the extension
GL_OES_standard_derivatives is supported.
V2 [Chris]: Correct trailing whitespace
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This could be recalculated, though it turns out the only use of it after
resource allocation is for calculating whole resource size (for scene size
accounting though that isn't quite ideal neither). Thus, instead just store
the whole resource size and drop it (saving a couple bytes of storage per
resource). It makes things simpler too. Note that for the accounting winsys
resources always come back with size 0 but this is unchanged (we don't actually
know the size in any case).
Also reformat llvmpipe_texture_layout (drop unneded indentation).
v2: adapt to previous changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Seems pointless to just duplicate some of the calculations (the calculation
of actual memory used compared to what was predicted in llvmpipe_texture_layout
actually could have differed slightly in some cases due to different alignment
rules used though this should have been of no consequence).
v2: keep the previous mip alignment of MAX2(64, cacheline). This was added for
ARB_map_buffer_alignment - I'm not convinced it's needed for textures, but
it was supposed to be cleanup without functional change. Also replace div
with 64bit mul / comparison.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
1D array miptrees were being laid out as a 2D texture with 1 slice.
This happened due to the mesa core storing the 1D array slice count in
the height field. On Intel hardware, we want to create a 2D array with
a height of 1 for the 1D array case.
Fixes assertion failure in piglit (gen6, gen8):
spec/glsl-1.30/execution/tex-miplevel-selection textureOffset 1DArrayShadow
In release builds of Mesa, this test was observed to cause a GPU hang
on gen8.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81450
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Adds 0-3 textureGather component selection and non-constant offsets
Caveat: 0 and 1 texture swizzles only work if textureGather component
select is 3 or a component that does not exist in the sampler texture
format. This is a hardware limitation, any other value returns
128/255=0.501961 for both 0 and 1.
Passes all textureGather piglit tests on radeon 6670, except for those
using 0/1 texture swizzles due to aforementioned reason.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Map TGSI_SEMANTIC_SAMPLEMASK to register/component.
Enable face register when sample mask is needed by shader.
Requires Evergreen/Cayman
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't want to log every single error (such as all the ones where the file
wasn't even present in our list of search paths), but if you didn't find any
driver, then seeing at least one error is useful (since the common case as a
developer is a single DEFAULT_DRIVER_DIR or GBM_DRIVERS_PATH entry).
v2: Rebase on swrast changes.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
I found myself often wanting this when I'm printing out a uint32_t mapping
of some GPU data, and I want to put in an interpretation of that value as
a float.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
execinfo.h is not available on DragonFly.
Fixes this build error.
CC glapi_gentable.lo
glapi_gentable.c:44:22: fatal error: execinfo.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
When using (d3d10) conformant out-of-bound behavior for texel fetching
(currently always enabled) the level still needs to be set to a safe value
even though the offset in the end won't get used because the level is used
to look up the mip offset itself and the actual strides, which might otherwise
crash.
For simplicity, we'll use level 0 in this case (this ought to be safe, llvmpipe
does not actually fill in level 0 information if first_level is larger, but
some random strides / offsets shouldn't hurt as ultimately we always use
offset 0 in this case).
Fixes a crash in some in-house test where random huge levels appear in
lp_build_fetch_texel() (the test actually uses level 0 always but if the
fetching happens in a block with a execution mask random values may appear).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The kms-dri swrast driver cannot share buffers using the GEM,
so it must tell the loader to disable extensions relying on
that, without disabling the image DRI extension altogether
(which would prevent the loader from working at all).
This requires a new gallium capability (which is queried on
the pipe_screen and for swrast drivers it's forwarded to the
winsys), and requires a new version of the DRI image extension.
[Emil Velikov]
- Rebased on top of gallium-dri megadrivers.
- Drop PIPE_CAP_BUFFER_SHARE and sw_winsys::get_param hook.
The can_share_buffer cap is set at InitScreen. We use a different
InitScreen (and thus value for the cap) function for kms_dri, due to
deeper differences originating from dri megadrivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add a new winsys and target that can be used with a dri2 state tracker
and loader instead of drisw. This allows to use gbm as a dri2/image
loader and avoid the extra copy from the backbuffer to the shadow
frontbuffer.
The new driver is called "kms_swrast", and is loaded by gbm as a
fallback, because it is only useful with the gbm platform (as no buffer
sharing is possible)
To force select the driver set the environment variable
GBM_ALWAYS_SOFTWARE
[Emil Velikov]
- Rebase on top of gallium megadriver.
- s/text/test/ in configure.ac (Spotted by Andreas Pokorny).
- Add scons support for winsys/sw/kms-dri and fix the build.
- Provide separate DriverAPI, due to different InitScreen hook.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Turn GBM into a swrast loader (providing putimage/getimage backed
by a dumb KMS buffer). This allows to run KMS+DRM GL applications
(such as weston or mutter-wayland) unmodified on cards that don't
have any client side HW acceleration component but that can do
modeset (examples include simpledrm and qxl)
[Emil Velikov]
- Fix make check.
- Split dri_open_driver() from dri_load_driver().
- Don't try to bind the swrast extensions when using dri.
- Handle swrast->CreateNewScreen() failure.
- strdup the driver_name, as it's free'd at destruction.
- s/LIBGL_ALWAYS_SOFTWARE/GBM_ALWAYS_SOFTWARE/
- Move gbm_dri_bo_map/unmap to gbm_driiint.h.
- Correct swrast fallback logic.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Whenever dd_create_screen/pipe_loader_* fails, gdrm->dev may be NULL.
Thus peeking inside the struct will lead to a crash.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
... on static targets. Otherwise we'll crash badly as gdrm->dev is
NULL when we try to copy the string driver_name.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
ERROR is a #define in the MSVC WinGDI.h header file.
Add the _TOKEN suffix as we do for a few other lexer tokens.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Principle of least surprise: --enable-debug should enable debugging.
Ages ago, Mesa's build system only added -g in dri-debug builds (yay for
the static Makefiles). If you forgot to change it (or wrap the build
with custom scripts), you would often be disappointed when trying to gdb
Mesa bugs. New developers, that may not yet have custom scripts, will
have this same issue.
I think we should enable experienced developers to do what they want,
and make things easier for new developers. I already pass '-ggdb3 -O1'
or '-ggdb3 -Og' for CFLAGS, and I don't want configure to change them
for me.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've had bugs in the past where we have been inadvertently matching the
default rule.
Just as we did in the pre-processor in the previous commit, we can use:
%option warn nodefault
in the compiler to instruct flex to not generate the default rule, and
further to warn if our set of rules could let any characters go unmatched.
With this warning active, flex actually warns that the catch-all rule we
recently added to the compiler could never be matched. Since that is all
safely determined at compile time now, we can safely drop this run-time
compiler error message, (as we do in this commit).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We've had multiple bugs in the past where we have been inadvertently matching
the default rule, (which we never want to do). We recently added a catch-all
rule to avoid this, (and made this rule robust for future start conditions).
Kristian pointed out that flex allows us to go one step better. This syntax:
%option warn nodefault
instructs flex to not generate the default rule at all. Further, flex will
generate a warning at compile time if the set of rules we provide are
inadequate, (such that it would be possible for the default rule to be
matched).
With this warning in place, I found that the catch-all rule was in fact
missing something. The catch-all rule uses a pattern of "." which doesn't
match newlines. So here we extend the newline-matching rule to all start
conditions. That is enough to convince flex that it really doesn't need
any default rule.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Using a single rule here means that we can use the <*> syntax to match
all start conditions. This makes the catch-all rule more robust against
the addition of future start conditions, (no need to maintain an ever-
growing list of start conditions for this rul).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
There is no behavioral change here. It's just easier to verify that lists
of start conditions include all expected conditions when they appear in a
consistent order.
The <INITIAL> state is special, so it appears first in all lists. All others
appear in alphabetical order.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In some of the recent glcpp bug-fixing, we found that glcpp was emitting
unrecognized characters from the input source file to stdout, and dropping
them from the source passed onto the compiler proper.
This was obviously confusing, and totally undesired.
The bogus behavior comes from an implicit default rule in flex, which is
that any unmatched character is implicitly matched and printed to stdout.
To avoid this implicit matching and printing, here we add an explicit
catch-all rule. If this rule ever matches it prints an internal compiler
error. The correct response for any such error is fixing glcpp to handle
the unexpected character in the correct way.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, the '\r' character was not explicitly matched by any lexer
rule. This means that glcpp would have been using the default flex rule to
match '\r' characters, (where they would have been printed to stdout rather
than actually correctly handled).
With this commit, we treat '\r' as equivalent to '\n'. This is clearly an
improvement the bogus printing to stdout. The resulting behavior is compliant
with the GLSL specification for any source file that uses exclusively '\r' or
'\n' to separate lines.
For shaders that use a multiple-character line separator, (such as "\r\n"),
glcpp won't be precisely compliant with the specification, (treating these as
two newline characters rather than one), but this should not introduce any
semantic changes to the shader programs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This test is written to exercise a bug which I recently wrote, (but
fortunately caught and fixed before ever committing it).
For the curious:
The bug happened when the NEWLINE_CATCHUP code didn't actually return the
NEWLINE token (due to the skipping). This resulted in the lexer continuing
on through all the subsequent rules while still in the NEWLINE_CATCHUP start
condition, (which then triggered the internal-compiler-error catch-all
rule).
What is intended is for the return of the NEWLINE token to start a new
iteration of the lexer loop, at which time the NEWLINE_CATCHUP-handling code
will reset from the <NEWLINE_CATCHUP> to the <INITIAL> start condition.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
At one point while rewriting the lexing rule for pre-processing numbers, I
made it a bit too aggressive and within a replacement list sucked up a
parameter name that appeared immediately after a period. This caused the
parameter name to be unreplaced when the macro was expanded.
It was in some piglit tests that I originally found this issue. Here, I'm
adding a test to "make check" to ensure that this behavior remains correct.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These operators aren't defined for preprocessor expressions, so we never
implemented them. This led them to be misinterpreted as strings of unary
'+' or '-' operators.
In fact, what is actually desired is to generate an error if these operators
appear in any preprocessor condition.
So this commit looks like it is strictly adding support for these
operators. And it is supporting them as far as passing them through to the
subsequent compiler, (which was already happening anyway).
What's less apparent in the commit is that with these tokens now being lexed,
but with no change to the grammar for preprocessor expressions, these
operators will now trigger errors there.
A new "make check" test is added to verify the desired behavior.
This commit fixes the following Khronos GLES3 CTS test:
invalid_op_1_vertex
invalid_op_1_fragment
invalid_op_2_vertex
invalid_op_2_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will emit an error for something like:
#define FOO(x,x) ...
Obviously, it's not a legal thing to do, and it's easy to check.
Add a "make check" test for this as well.
This fixes the following Khronos GLES3 CTS tests:
invalid_function_definitions.unique_param_name_vertex
invalid_function_definitions.unique_param_name_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just reading the code, it looked like a bug that _define_object_macro had this
check, but _define_function_macro did not. Upon further reading, that's
because the check is to allow for our builtins to be defined, (and there are
no builtin function-like macros).
Add my new understanding as a comment to help the next reader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we had a single token for "#if" but now that we have two separate
tokens, it looks much better to see:
HASH_TOKEN IF
than:
HASH_TOKEN HASH_IF
(Note, that for the same reason we use HASH_TOKEN instead of HASH, we also use
DEFINE_TOKEN instead of DEFINE to avoid a conflict with the <DEFINE> start
condition in the lexer.)
There should be no behavioral change from this commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Without this, in the <PP> state, we would hit Flex's default rule, which
prints tokens to stdout, rather than returning them as tokens. (Or, after the
previous commit, we would hit the new catch-all rule and generate an internal
compiler error.)
With this commit in place, we generate the desired syntax error.
This manifested as a weird bug where shaders with semicolons after
extension directives, such as:
#extension GL_foo_bar : enable;
would print semicolons to the screen, but otherwise compile just fine
(even though this is illegal).
Fixes Piglit's extension-semicolon.frag test.
This also fixes the following Khronos GLES3 conformance tests, (and for real
this time):
invalid_char_in_name_vertex
invalid_char_in_name_fragment
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is to avoid the default, silent flex rule which simply prints the
character to stdout.
For the following Khronos GLES3 conformance tests:
invalid_char_in_name_vertex
invalid_char_in_name_fragment
With this commit, these tests now report Pass where they previously reported
Fail, but Mesa isn't behaving correctly yet. It's now reporting the internal
error where what is really desired is a syntax error.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's legal (though highly bizarre) for a pre-processor directive to look like
this:
# /* why? */ define FOO bar
This behavior comes about since the specification defines separate logical
phases in a precise order, and comment-removal occurs in a phase before the
identification of directives.
Our implementation does not use an actual separate phase for comment removal,
so some extra care is necessary to correctly parse this. What we want is for
'#' to introduce a directive iff it is the first token on a line, (ignoring
whitespace and comments). Previously, we had a lexical rule that worked only
for whitespace (not comments) with the following regular expression to find a
directive-introducing '#' at the beginning of a line:
HASH ^{HSPACE}*#{HSPACE}*
In this commit, we switch to instead use a simple literal match of '#' to
return a HASH_TOKEN token and add a new <HASH> start condition for whenever
the HASH_TOKEN is the first non-space token of a line. This requires the
addition of the new bit of state: first_non_space_token_this_line.
This approach has a couple of implications on the glcpp parser:
1. The parser now sees two separate tokens, (such as HASH_TOKEN and
HASH_DEFINE) where it previously saw one token (HASH_DEFINE) for
the sequence "#define". This is a straightforward change throughout
the grammar.
2. The parser may now see a SPACE token before the HASH_TOKEN token of
a directive. Previously the lexical regular expression for {HASH}
would eat up the space and there would be no SPACE token.
This second implication is a bit of a nuisance for the parser. It causes a
SPACE token to appear in a production of the grammar with the following two
definitions of a control_line:
control_line
SPACE control_line
This is really ugly, since normally a space would simply be a token
separator, so it wouldn't appear in the tokens of a production. This leads to
a further problem with interleaved spaces and comments:
/* ... */ /* ... */ #define /* ..*/
For this, we must not return several consecutive SPACE tokens, or else we would need an arbitrary number of new productions:
SPACE SPACE control_line
SPACE SPACE SPACE control_line
ad nauseam
To avoid this problem, in this commit we also change the lexer to emit only a
single SPACE token for any series of consecutive spaces, (whether from actual
whitespace or comments). For this compression, we add a new bit of parser
state: last_token_was_space. And we also update the expected results of all
necessary test cases for the new compression of space tokens.
Fortunately, the compression of spaces should not lead to any semantic changes
in terms of what the eventual GLSL compiler sees.
So there's a lot happening in this commit, (particularly for such a tiny
feature). But fortunately, the lexer itself is looking cleaner than ever. The
only ugly bit is all the state updating, but it is at least isolated to a
single shared function.
Of course, a new "make check" test is added for the new feature, (directives
with comments and whitespace interleaved in many combinations).
And this commit fixes the following Khronos GLES3 CTS tests:
function_definition_with_comments_vertex
function_definition_with_comments_fragment
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is in preparation for the planned addition of a new <HASH> start
condition to the lexer. Both start conditions and token types are, of course,
in the same default C namespace, so a start condition and a token type with
the same name will collide. (And unfortunately, they are both apparently
implemented as equivalent numeric types so the collision is undetected at
compile time and simply leads to unpredictable behavior at run time.)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This commit does not cause any behavioral change for any valid program. Prior
to entering the <DEFINE> start condition, the only valid start condition is
<INITIAL>, so whether pushing/popping <DEFINE> onto the stack or explicit
returning to <INITIAL> is equivalent.
The reason for this change is that we are planning to soon add a start
condition for <HASH> with the following semantics:
<HASH>: We just saw a directive-introducing '#'
<DEFINE>: We just saw "#define" starting a directive
With these two start conditions in place, the only correct behavior is to
leave <DEFINE> by returning to <INITIAL>. But the old push/pop code would have
returned to the <HASH> start condition which would then cause an error when
the next directive-introducing '#' would be encountered.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The verbose debug output from the parser is quite useful when debugging, and
having this available as a command-line option is much more convenient than
manually forcing this into the code when needed, (which is what I had been
doing for too long previously).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For the first line we were initializing the column to 1, but for all
subsequent lines we were initializing the column to 0. The column number is
advanced for each token read before any error message is printed. So the 0
value is the correct initialization, (so that the first column is reported as
column 1).
With this extremely minor change, many of the .expected files are updated such
that error messages for the first line now have the correct column number in
them.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Here, "skipping" refers to the lexer not emitting any tokens for portions of
the file within an #if condition (or similar) that evaluates to false.
Previously, the lexer had a special <SKIP> start condition used to control
this skipping. This start condition was not handled like a normal start
condition. Instead, there was a particularly ugly block of code set to be
included at the top of the generated lexing loop that would change from
<INITIAL> to <SKIP> or from <SKIP> to <INITIAL> depending on various pieces of
parser state, (such as parser->skip_state and parser->lexing_directive).
Not only was that an ugly approach, but the <SKIP> start condition was
complicating several glcpp bug fixes I attempted recently that want to use
start conditions for other purposes, (such as a new <HASH> start condition).
The recently added RETURN_TOKEN macro gives us a convenient way to implement
skipping without using a lexer start condition. Now, at the top of the
generated lexer, we examine all the necessary parser state and set a new
parser->skipping bit. Then, in RETURN_TOKEN, we examine parser->skipping to
determine whether to actually emit the token or not.
Besides this, there are only a couple of other places where we need to examine
the skipping bit (other than when returning a token):
* To avoid emitting an error for #error if skipped.
* To avoid entering the <DEFINE> start condition for a #define that is
skipped.
With all of this in place in the present commit, there are hopefully no
behavioral changes with this patch, ("make check" still passes all of the
glcpp tests at least).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Now that we have a common macro for returning tokens, it makes sense to
perform some of the common work there, (such as copying string values).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp parser is line-based, so it needs to see a NEWLINE token at the end
of each line. This causes a trick for files that end without a final newline.
Previously, the lexer for glcpp punted in this case by unconditionally
returning a NEWLINE token at end-of-file, (causing most files to have an extra
blank line at the end). Here, we refine this by lexing end-of-file as a
NEWLINE token only if the immediately preceding token was not a NEWLINE token.
The patch is a minor change that only looks huge for two reasons:
1. Almost all glcpp test result ".expected" files are updated to drop
the extra newline.
2. All return statements from the lexer are adjusted to use a new
RETURN_TOKEN macro that tracks the last-token-was-a-newline state.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp implementation has long had code to support a file that ends without
a final newline. But we didn't have a "make check" test for this.
Additionally, the <EOF> action was restricted only to the <INITIAL> state so
it would fail to get invoked if the EOF was encountered in the <COMMENT> or
the <DEFINE> case. Neither of these was a bug, per se, since EOF in either
of these cases is an error anyway, (either "unterminated comment" or
"missing macro name for #define").
But with the new explicit support for these cases, we not generate clean error
messages in these cases, (rather than "unexpected $end" from before).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The NEWLINE_CATCHUP code is only intended to be invoked after we lex an actual
newline character ('\n'). The two extra calls here were apparently added
accidentally because the pattern happened to contain a (negated) '\n',
(see commit 6005e9cb28).
I don't think either case could have caused any actual bug. (In the first
case, the pattern matched right up to the next newline, so the NEWLINE_CATCHUP
code was just about to be called. In the second case, I don't think it's
possible to actually enter the <SKIP> start condition after commented newlines
without any intervening newline.)
But, if nothing else, the code is cleaner without these extra calls.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The recent adddition of an error for "#define followed by a non-identifier"
was a bit to aggressive since it used a regular expression in the lexer to
flag any character that's not legal as the first character of an identifier.
But we need to allow comments to appear here, (since we aren't removing
comments in a preliminary pass). So we refine the error here to only flag
characters that could not be an identifier, nor a comment, nor whitespace.
We also augment the existing comment support to be active in the <DEFINE>
state as well.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, if the preprocessor encountered a #define with a non-identifier,
such as:
#define 123 456
The lexer had no explicit rules to match non-identifiers in the <DEFINE> start
state. Because of this, flex's default rule was being invoked, (printing
characters to stdout), and all text was being discarded by the compiler until
the next identifier. As one can imagine, this led to all sorts of interesting
and surprising results.
Fix this by adding an explicit rule complementing the existing
identifier-based rules that should catch all non-identifiers after #define and
reliably give a well-formatted error message.
A new test is added to "make check" to ensure this bug stays fixed.
This commit also fixes the following Khronos GLES3 CTS test:
define_non_identifier_vertex
(The "fragment" variant was passing earlier only because the preprocessor was
behaving so randomly and causing the compilation to fail. It's lucky, in fact,
that the "vertex" version succesfully compiled so we could find and fix this
bug.)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The glcpp lexer and parser use the space_tokens state bit to avoid emitting
tokens for spaces while parsing a directive. Previously, this bit was only
being set again by the first non-space token following a directive.
This led to a bug where a space, (or a comment that should emit a space),
immediately following a directive, (optionally searated by newlines), would be
omitted from the output.
Here we fix the bug by also setting the space_tokens bit whenever we lex a
newline in the standard start conditions.
mesa/mesa/src/mesa/drivers/dri/common/xmlconfig.c:104:10: warning: #warning "Per application configuration won't work with your OS version." [-Wcpp]
# warning "Per application configuration won't work with your OS version."
Signed-off-by: Yaakov Selkowitz <yselkowitz@users.sourceforge.net>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
We can create 3D texture views. Avoids an assertion in piglit
fbo-generatemipmap-3d test and allows it to pass.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
While running https://github.com/nvMcJohn/apitest with apitrace I noticed that Mesa was producing bogus results:
wglChoosePixelFormatARB(hdc, piAttribIList = {...}, pfAttribFList = &0, nMaxFormats = 1, piFormats = {19, 65576, 37, 198656, 131075, 0, 402653184, 0, 0, 0, 0, -573575710}, nNumFormats = &12) = TRUE
However https://www.opengl.org/registry/specs/ARB/wgl_pixel_format.txt states
<nNumFormats> returns the number of matching formats. The returned
value is guaranteed to be no larger than <nMaxFormats>.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes 3D texture support in all these cases, because array_size is 1
with 3D textures and depth0 actually contains the "array size".
util_max_layer is universal and returns the last layer index for any texture
target.
A lot of the cases below can't actually be hit with 3D textures, but let's
be consistent.
This fixes a failure in:
piglit layered-rendering/clear-color-all-types 3d single_level
for r600g and radeonsi, which was caused by an incorrect CMASK size
calculation.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I actually couldn't reproduce this one, but internal docs recommend this
workaround. Better safe than sorry.
Also, the number of dwords for the sync packets is increased by 4 instead
of 2, because it wasn't bumped last time when a new packet was added there.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the checkerboard pattern in glxgears and anything that triggers
fast color clear.
num_channels is always <= 8, but Hawaii has 16 pipes.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This new name isn't so confusing.
I also changed the gallivm limit, because it looked wrong.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: use sizeof(float[4])
Most image functions are required to return a CL_INVALID_OPERATION
error when used on devices without image support.
v2:
- Simplified the code
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When mapping a busy resource with PIPE_TRANSFER_DISCARD_RANGE or
PIPE_TRANSFER_FLUSH_EXPLICIT, we can avoid blocking by allocating and mapping
a staging bo, and emit pipelined copies at proper places. Since the staging
bo is never bound to GPU, we give it packed layout to save space.
Add xfer_map() to replace map_bo_for_transfer(). Add xfer_unmap() and
xfer_alloc_staging_sys() to simplify texture and buffer mapping/unmapping, and
enable more code sharing between them.
Add a bunch of helper functions and a big comment for
choose_transfer_method(). This also fixes handling of
PIPE_TRANSFER_MAP_DIRECTLY to not ignore tiling.
With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
built-in-constants tests/spec/arb_compute_shader/minimum-maximums.txt
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With MESA_EXTENSION_OVERRIDE=GL_ARB_compute_shader, this fixes piglit:
* arb_compute_shader-minmax
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Instead of falling back to just the block name (which we won't find),
look for the first element of the block array. We'll deal with the rest
in the backend by arranging for the blocks to be laid out contiguously.
V2: Squashed together patches 3, 5 of V1, plus a naming tweak.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously this was a block index with special semantics for -1.
With ARB_gpu_shader5, this need not be a compile-time constant, so
allow any rvalue here and convert the -1 to a NULL pointer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without doing a lot more work, we have no idea which indices may
be used at runtime, so just mark them all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us two things: we now need less item copies when we have
to defrag+grow the pool (to just one copy per item) and, even in the
case where we don't need to defrag the pool, we reduce the data copied
to just the useful data that the items use.
Note: The fallback path is a bit ugly now, but hopefully we won't need
it much.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now, before moving everything to host memory, we try to create a
new resource to use as a pool. I we succeed we just use this resource
and delete the previous one. If we fail we fallback to using the
shadow.
This should make growing the pool faster, and we can also save
64KB of memory that were allocated for the 'shadow', even if they
weren't used.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Move the bits we want to share between generations from fd3_program to
ir3_shader. So overall structure is:
fdN_shader_stateobj -> ir3_shader -> ir3_shader_variant -> ir3
|- ...
\- ir3_shader_variant -> ir3
So the ir3_shader becomes the topmost generation neutral object, which
manages the set of variants each of which generates, compiles, and
assembles it's own ir.
There is a bit of additional renaming to s/fd3_compiler/ir3_compiler/,
etc.
Keep the split between the gallium level stateobj and the shader helper
object because it might be a good idea to pre-compute some generation
specific register values (ie. anything that is independent of linking).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First step of reoganization split out compiler (so it can be shared
between a3xx and a4xx). Rename ir3_shader -> ir3 (since we'll want
the name ir3_shader for a higher level object).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The scheduler also needs to be aware of predicate register (p0) in
addition to address register (a0).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems like for the most part, different behaviors, workarounds, etc,
should be conditional on GPU patch revision (ie. a320.0 vs a320.2)
rather than GPU id (a320 vs a330).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The fixed size heap is a remnant of the fdre-a3xx assembler. Yet it is
convenient for being able to free the entire data structure in one shot
without worrying about leaking nodes.
Change it to dynamically grow the heap size (adding chunks) as needed so
we don't have an artificial upper limit on shader size (other than hw
limits) and don't always have to allocate worst-case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The GBM_DRIVERS_PATH environment variable is not documented, and only
used to set the location of gbm drivers, while LIBGL_DRIVERS_PATH is
used for everything else, and is documented.
Generally this split leads to confusion as to why gbm doesn't work.
This patch will read LIBGL_DRIVERS_PATH as a fallback if
GBM_DRIVERS_PATH is not set.
The comments clearly indicate that using LIBGL_DRIVERS_PATH is
preferred over GBM_DRIVERS_PATH.
v2: - Use GBM_DRIVERS_PATH as a fallback
v3: [jordan.l.justen@intel.com] - Make LIBGL_DRIVERS_PATH the fallback
Signed-off-by: Dylan Baker <baker.dylan.c@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to a quick micro-benchmark, this new version is 20% faster on my
Haswell laptop.
v2: Removed the XXX note about x86_64 from the comment
v3: Use an intrinsic instead of an __asm__ block. This should give us MSVC
support for free.
v4: Enable it for all x86_64 builds, not just with USE_X86_64_ASM
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
... to eliminate an ELSE instruction followed immediately by an ENDIF.
instructions in affected programs: 704 -> 700 (-0.57%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a situation where double-register values are used, the phi nodes can
still end up being u32 values. They all get merged into one RA node
though. When fixing up the merge (which comes after the phi node), the
phi node's def would get fixed, but not its sources which would remain
at the low register value.
This maintains the invariant that a phi node's defs and sources are
allocated the same register.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since intel is always going to be little-endian,
GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and
BGRA textures, so the same acceleration code will work. We might as well
use it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend
test, key->nr_color_regions == 2, but the dual source blend FB write has
ir->target set to 0. So we failed to set "Last Render Target Select" on
any FB write message.
We only emit one FB write per render target, so my comment about setting
LastRT on every FB write directed at the last color region is a bit...
misinformed. According to the documentation, depth buffer writes and
scoreboard updates happen on the FB write with LastRT set, so I believe
we want to set it only once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which
needs to know whether it's currently processing a VS or GS. It isn't
worth adding virtual methods for this case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dropping this helps most lines fit in an 80 column terminal. The
absence of WE_normal also helps call attention to WE_all, where
something unusual is going on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Don't assert (debug builds) or assign random uninitialized value for
predicate register (p0).. that screws up kill, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Accuracy of some operations was recently improved in the R600 backend,
at the cost of slower code. This is required for compute shaders,
but not for graphics shaders. Add unsafe-fp-math hint to make LLVM
generate faster but possibly less accurate code.
Piglit didn't indicate any regressions.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.
This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.
The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.
This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.
For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.
It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.
Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Actually what we currently handle is just the SCALED versions, and not
the int versions. The difference probably matters more when we actually
support integer in the compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Teach new compiler scheduling and register assignment how to deal with
relative addressing. This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.
NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!). It seems more reliable to do the address calculation
in s16, like the blob does. Which means teaching RA how to deal with
mixed half and full precision allocation. Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Technically we should not need these. CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances. But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.
When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.
v2: Only enable the extension for the i965 driver instead of all DRI drivers.
Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
some more comments.
v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
Handle sRGB textures. Explicitly disable dithering.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.
v2: Don't clear some of the images if only one of them makes an error
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.
Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
since some bits are done in tree, but nobody is working on it anymore.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.
Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag
V2: Move the check to _mesa_texstore_can_use_memcpy() function.
Add check for floating point data types.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.
This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Otherwise, the performance warning for shader recompiles will just say
"something else".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference. So, every time I want to compare assembly, I have to strip
this out.
The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer. Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Avoids regenerating it unnecessarily.
Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.
cfg calculations: 429492 -> 193197 (-55.02%)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The is used for programs that have arrays of constants that
are accessed using dynamic indices. The shader will compute
the base address of the constants and then access them using
SMRD instructions.
The parameter is an int16_t, and we're check that it's value will fit in
16-bits. Yes, the value that is stored in 16-bits will surely fit in
16-bits.
brw_inst.h: In function 'brw_inst_set_gen6_jump_count':
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_inst.h: In function 'brw_inst_set_src1_vstride':
brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Before it was only storing one of the color components due to truncation.
With this patch it now properly stores all of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
This removes the intermediate storage (pm4 state) and generates descriptors
directly in a staging buffer.
It also reduces the number of flushes, because the descriptors no longer
take CS space.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Sampler descriptors are now represented by si_descriptors.
This also adds support for fine-grained sampler state updates and
the border color update is now isolated in a separate function.
Border colors have been broken if texturing from multiple shader stages is
used. This patch doesn't change that.
BTW, blitting already makes use of fine-grained state updates.
u_blitter uses 2 textures at most, so we only have to save 2.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The draw indirect packets cannot set VGT_INDX_OFFSET, they can only set user
data SGPRs. This is the only way to support start/index_bias with indirect
drawing.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Most (all?) Unigine shaders fail to compile without this if sample shading
is advertised. This is, of course, Unigine developers' fault.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is needed to make Unigine Heaven 4.0 and Unigine Valley 1.0 work
with sample shading.
Also, if this is disabled, the error message at least makes sense now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only supported on evergreen and later. Currently limited
to single component textures as the hardware GATHER4
instruction ignores texture swizzles.
Piglit quick run passes on radeon 6670 with all
applicable textureGather tests, no regressions.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL
as base internal format and non-zero x, y offsets. Currently x, y
offsets are ignored while updating the texture image.
Fixes Khronos GLES3 CTS tests:
npot_tex_sub_image_2d
npot_tex_sub_image_3d
npot_pbo_tex_sub_image_2d
npot_pbo_tex_sub_image_2d
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This reverts commit bbefb15e01.
Fixes the 11 regressions caused in framebuffer_blit tests in
Khronos GLES3 CTS tests:
Original patch reduced the instruction count but had no performance
benefits. So, it's safe to revert it without causing any performance
regressions.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This commit "mesa: fix packing of float texels to GL_SHORT/GL_BYTE" replaced *_TO_BYTE to *_TO_BYTE_TEX because *_TO_FLOAT_TEX are used to unpack the texels to floats.
In this case *_TO_FLOATZ in function extract_float_rgba also should be replaced to *_TO_FLOAT_TEX. Underline that these macros automatically preserve zero when converting.
The regression was observed on 3 oglconform tests:
snorm-textures basic.getTexImage
snorm-textures advanced.mipmap.manual.getTex
snorm-textures advanced.mipmap.upload.getTex
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This fixes following tests in es3conform:
shaders.switch.default_not_last_dynamic_vertex
shaders.switch.default_not_last_dynamic_fragment
and makes following tests in Piglit pass:
glsl-1.30/execution/switch/fs-default-notlast-fallthrough
glsl-1.30/execution/switch/fs-default_notlast
No Piglit regressions.
v2: take away unnecessary ir_if, just use conditional assignment
v3: use foreach_in_list instead of foreach_list
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
components() includes matrix columns, so if this code encountered a
matrix, it would ask for something like a vec9 or vec16. This is
clearly not what you want.
Earlier code now prevents this from seeing matrices, but we should still
use vector_elements, for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't handle things like (vector * matrix) correctly, and
apparently Matt's intention was to bail.
Fixes shader compilation in Natural Selection 2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 3178d2474a.
This caused GPU hangs on Ivybridge for some users and huge (80%)
performance regressions across the board on multiple platforms.
We need to find a better solution. I've made several attempts, but none
of them have worked yet. In the meantime, we should revert this.
Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but
that's okay, since we don't expose GL_ARB_gpu_shader5 yet.
Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated
test case on Haswell.
This code should execute without regard to the currently executing
channels. Asking for gl_SampleID inside control flow might break in
strange ways. It appears to break even at the top of the program in
SIMD16 mode occasionally as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
gen8_fs_generator uses these to decide whether to set the execution size
to 8 or 16, so we incorrectly made both of these MOVs the full width in
SIMD16 shaders. (It happened to work out on Gen4-7.)
Setting them should also help inform optimization passes what's really
going on, which could help avoid bugs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
Both inst->force_uncompressed and inst->force_sechalf mean that the
generated instruction should be uncompressed and have an execution size
of 8. We don't require the visitor to set both flags - setting
inst->force_sechalf by itself is supposed to be enough.
On Gen4-7, guess_execution_size() demoted instructions to 8-wide based
on the default compression state. On Gen8+, we instead set a default
execution size, which worked great...except that we forgot to check
inst->force_sechalf when deciding whether to use 8 or 16.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: mesa-stable@lists.freedesktop.org
nouveau_fence_update does real work unconditionally. Avoid doing that if
the fence we're checking on has already been signalled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This complements the existing append function. It's implemented in a
rather simple way right now; it could be changed if performance is a
concern.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
There are no queries for GL_TEXTURE_LUMINANCE_SIZE,
GL_TEXTURE_INTENSITY_SIZE, GL_TEXTURE_LUMINANCE_TYPE, or
GL_TEXTURE_INTENSITY_TYPE in any version of OpenGL ES or desktop OpenGL
core profile.
NOTE: Without changes to piglit, this regresses
required-sized-texture-formats.
v2: Rebase on different initial change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2 <mesa-stable@lists.freedesktop.org>
There are no texture borders in any version of OpenGL ES or desktop
OpenGL core profile.
Fixes piglit's gl-3.2-texture-border-deprecated.
v2: Rebase on different initial change.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.2 <mesa-stable@lists.freedesktop.org>
Instead of catching the special case early, handle it by constructing a
fake gl_texture_image that will cause the values required by the OpenGL
4.0 spec to be returned.
Previously, calling
glGenTextures(1, &t);
glBindTexture(GL_TEXTURE_2D, t);
glGetTexLevelParameteriv(GL_TEXTURE_2D, 0, 0xDEADBEEF, &value);
would not generate an error.
Anuj: Can you verify this does not regress proxy_textures_invalid_size?
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Suggested-by: Brian Paul <brianp@vmware.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
A similar attempt was made in commit 5ff1e446 and was reverted in commit
a39428cf after causing a regression in an ES 3 conformance test. The
test still passes after this commit.
total instructions in shared programs: 1994827 -> 1992858 (-0.10%)
instructions in affected programs: 128247 -> 126278 (-1.54%)
GAINED: 0
LOST: 1
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If we saw a tree that looked like
vec3
/ \
vec3 float
/ \
vec3 float
/ \
vec3 float
We would see that all of the expression types were vec3, and then
rebalance to
vec3
/ \
vec3 vec3 <-- should be float
/ \ / \
vec3 float float float
This patch adds code to visit the rebalanced tree and update the
expression types from the bottom up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80880
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
0cbefc1bea added a source argument to
EMIT/ENDPRIM, but it did not update tgsi_ureg accordingly, causing all
users of ureg_EMIT/ENDPRIM to fail at runtime with an assertion failure.
Trivial.
This patch fixes this MinGW build error.
glapi_gentable.c: In function '_glapi_create_table_from_handle':
glapi_gentable.c:123:9: error: implicit declaration of function 'dlsym' [-Werror=implicit-function-declaration]
*procp = dlsym(handle, symboln);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Brian Paul <brianp@vmware.com>
Vectors are falling in to the ir_dereference_array() path.
Without this change, the following glsl aborts the debug driver,
or gets the wrong answer in release:
mat2x2 a = mat2( vec2( 1.0, vertex.x ), vec2( 0.0, 1.0 ) );
Also submitting piglit tests, will reference in bug.
v2: Rebase on Mesa master.
v3: Remove unneeded check for arrays, which are covered by
process_array_constructor(), recommended by Timothy Arceri.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Courtney Goeltzenleuchter <courtney@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79373
On Cygwin and MinGW, linking a shared library also generates an import library
Use a wildcard which also matches the name of the megadriver import lib,
mesa_dri_drivers.dll.a, so that is also removed after megadriver symlinks are
created
(This then matches src/gallium/targets/dri/Makefile.am, which already does
things this way)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
SIMD8-only for now.
V5: - Fix style complaints
- Move prototype to be with other oddball emit functions
- Use unreachable() instead of assert() where possible
V6: - Describe what is happening with the clamping
- Add reg_width to make some expressions clearer
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The backend will have to do a message send, so we want to keep these in
one piece, just like texture ops.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V5: - Split into separate opcodes
- Pass message data in src1 immediate
- Put noperspective bit in fs_inst rather than adding any junk to
backend_instruction
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- Don't try to disassemble send's src1 as a descriptor if it's not an
immediate.
- In the same case, show src1 as an operand (makes it easier to see
bogus register regions, etc -- the hardware is very fussy)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We'd otherwise go looking into virtual_grf_sizes for things that aren't
in there at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
V2: - Don't assume everyone wants interpolateAtSample() lowered to
interpolateAtOffset. It turns out this isn't what we want most
of the time for i965. Lowering can be added later in an ir pass
which drivers opt into, rather than bolting it straight into the
builtin definition.
- Only expose the interpolateAt* builtins in the fragment language.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Will be used to implement interpolateAt*() from ARB_gpu_shader5
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The new interpolateAt* builtins have strange restrictions on the
<interpolant> parameter.
- It must be a shader input, or an element of a shader input array.
- It must not include a swizzle.
V2: Don't abuse ir_var_mode_shader_in for this; make a new flag.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of using intr_name in lp_build_tgsi_action, this selects the names
with a switch statement in the emit function. This allows emitting
llvm.SI.sample for instructions without offsets and llvm.SI.image.sample.*.o
otherwise.
This depends on my LLVM changes.
When LLVM 3.5 is released, I'll switch all texture instructions to the new
intrinsics.
This happens when glGetMultisamplefv (or any other non-draw function) is
called, which doesn't invoke the VBO module to update _DrawArrays and
the pointer is invalid at that point.
However st/mesa still dereferences it to setup vertex buffers ==> crash.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adjust definition of _XOPEN_SOURCE appropriately for use of strndup() added with
commit da3a47d6
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Before, we were checking the level against view->u.tex.last_level but
level is not valid for buffers. Plus, the aliasing of the view->u.tex
view->u.buf members (a union) caused the level checking arithmetic to
be totally wrong. The net effect is we always returned early for
PIPE_BUFFER size queries.
This fixes the piglit "textureSize 140 fs samplerBuffer" test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 54e91e7420 introduced a function declaration that uses
brw_context. While brw_context tends to get included in most files, it
is not when compiling intel_asm_annotation.c resulting in the following
warning:
In file included from brw_shader.h:25:0,
from brw_cfg.h:32,
from intel_asm_annotation.c:24:
brw_reg.h:122:39: warning: 'struct brw_context' declared inside
parameter list [enabled by default]
brw_reg.h:122:39: warning: its scope is only this definition or
declaration, which is probably not what you want [enabled by default]
Add a forward-declaration for struct brw_context to avoid the issue.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the current logic, it's very likely that s/r indirect sources are
right after the "regular" ones. Unset them before moving the texture
arguments over rather than after, as one of those arguments would
likely have assumed one of the s/r positions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Convert the final dri target to the single DRI (megadriver) library.
Cleanup all the automake leftovers from the conversion stage and
update the scons build.
v2: Link in llvmpipe, when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Move the driver_name to dri2/drisw and remove all the SPLIT_TAGETS
mayhem. In the next step we'll unify the dri and dri-swrast targets,
completing the gallium DRI megadriver.
v2: Remove leftover st/dri Makefiles from CONFIG_FILES. Spotted by
Thomas Helland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Export the approapriate new symbol, and keep backwards compat
via the megadriver_stub helper library.
Our next step would be to unify dri/drm and dri/sw, leading to
a complete megadrivers solution, and having a single library
that provides dri across all targets.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Rather than building two identical ones for dri-vmwgfx and dri-swrast
build a single library, and drop some duplication in the build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
... and use libmegadriver_stub as their provider.
Teach scons how to build the library archive and use it.
v2: scons: fix build on a drm-less system.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
With all the users converted to __driGetExtensions_* we can
have only a single inclusion of the required header + define.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
The symbol is introduced by the mesa megadrivers, and
adding gallium support for it will allow us to merge
st/dri/drm and st/dri/sw. Resulting in a single dri library
across all of gallium.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
The symbol is introduced by the mesa megadrivers, and adding
gallium support for it will allow us to merge st/dri/drm and
st/dri/sw. Resulting in a single dri library across gallium.
v2: Rebase on top of gallium dri3.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
This enables a gallium driver not to care about the semantics of
ARB_sample_shading vs ARB_gpu_shader5 sample attributes. When
ARB_sample_shading-style sample shading is enabled, all of the fp inputs
are marked for interpolation at the sample location.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new location field can be either center, centroid, or sample, which
indicates the location that the shader should interpolate at.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
i965 precompiles shaders at link time, and prints a disassembly if
INTEL_DEBUG=vs,gs,fs, including the shader name. However, blit shaders
were showing up as "unnamed" since we hadn't set a name prior to
linking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Originally, we didn't have direct accessors for all of the GLSL types,
so the only way to get at them was to use the symbol table. Now, we
can just get at them directly, which is simpler and faster.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The lexer was insisting that there be at least one character after "#pragma"
and before the end of the line. This caused an error for a line consisting
only of "#pragma" which volates at least the following sentence from the GLSL
ES Specification 3.00.4:
The scope as well as the effect of the optimize and debug pragmas is
implementation-dependent except that their use must not generate an
error. [Page 12 (Page 28 of PDF)]
and likely the following sentence from that specification and also in
GLSLangSpec 4.30.6:
If an implementation does not recognize the tokens following #pragma,
then it will ignore that pragma.
Add a "make check" test to ensure no future regressions.
This change fixes at least part of the following Khronos GLES3 CTS test:
preprocessor.pragmas.pragma_vertex
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We've always warned about this case, but a recent confromance test expects
this to be an error that causes compilation to fail. Make it so.
Also add a "make check" test to ensure these errors are generated.
This fixes the following Khronos GLES3 conformance tests:
invalid_conditionals.tokens_after_ifdef_vertex
invalid_conditionals.tokens_after_ifdef_fragment
invalid_conditionals.tokens_after_ifndef_vertex
invalid_conditionals.tokens_after_ifndef_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While writing the previous commit message, I just felt bad documenting the
shortcoming of the change, (that undefined macro names would not be reported
in error messages).
Fix this by preserving the first-encounterd undefined macro name and reporting
that in any resulting error message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GLSL ES Specification 3.00.4 says:
#if, #ifdef, #ifndef, #else, #elif, and #endif are defined to operate
as for C++ except for the following:
...
• Undefined identifiers not consumed by the defined operator do not
default to '0'. Use of such identifiers causes an error.
[Page 11 (page 127 of the PDF file)]
as well as:
The semantics of applying operators in the preprocessor match those
standard in the C++ preprocessor with the following exceptions:
• The 2nd operand in a logical and ('&&') operation is evaluated if
and only if the 1st operand evaluates to non-zero.
• The 2nd operand in a logical or ('||') operation is evaluated if
and only if the 1st operand evaluates to zero.
If an operand is not evaluated, the presence of undefined identifiers
in the operand will not cause an error.
(Note that neither of these deviations from C++ preprocessor behavior apply to
non-ES GLSL, at least as of specfication version 4.30.6).
The first portion of this, (generating an error for an undefined macro in an
(short-circuiting to squelch errors), was not implemented previously, but is
implemented in this commit.
A test is added for "make check" to ensure this behavior.
Note: The change as implemented does make the error message a bit less
precise, (it just states that an undefined macro was encountered, but not the
name of the macro).
This commit fixes the following Khronos GLES3 conformance test:
undefined_identifiers.valid_undefined_identifier_1_vertex
undefined_identifiers.valid_undefined_identifier_1_fragment
undefined_identifiers.valid_undefined_identifier_2_vertex
undefined_identifiers.valid_undefined_identifier_2_fragment
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The preprocessor defines a notions of a "preprocessing number" that
starts with either a digit or a decimal point, and continues with zero
or more of digits, decimal points, identifier characters, or the sign
symbols, ('-' and '+').
Prior to this change, preprocessing numbers were lexed as some
combination of OTHER and IDENTIFIER tokens. This had the problem of
causing undesired macro expansion in some cases.
We add tests to ensure that the undesired macro expansion does not
happen in cases such as:
#define e +1
#define xyz -2
int n = 1e;
int p = 1xyz;
In either case these macro definitions have no effect after this
change, so that the numeric literals, (whether valid or not), will be
passed on as-is from the preprocessor to the compiler proper.
This fixes the following Khronos GLES3 CTS tests:
preprocessor.basic.correct_phases_vertex
preprocessor.basic.correct_phases_fragment
v2. Thanks to Anuj Phogat for improving the original regular expression,
(which accepted a '+' or '-', where these are only allowed after one of
[eEpP]. I also expanded the test to exercise this.
v3. Also fixed regular expression to require at least one digit at the
beginning (after an optional period). Otherwise, a string such as ".xyz" was
getting sucked up as a preprocessing number, (where obviously this should be a
field access). Again, I expanded the test to exercise this.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, a line such as:
#else garbage
would flag an error if it followed "#if 0", but not if it followed "#if 1".
We fix this by setting a new bit of state (lexing_else) that allows the lexer
to defer switching to the <SKIP> start state until after the NEWLINE following
the #else directive.
A new test case is added for:
#if 1
#else garbage
#endif
which was untested before, (and did not generate the desired error).
This fixes the following Khronos GLES3 CTS tests:
tokens_after_else_vertex
tokens_after_else_fragment
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, the test suite was expecting the compiler to allow a redefintion
of a macro with whitespace added, but gcc is more strict and allows only for
changes in the amounts of whitespace, (but insists that whitespace exist or
not in exactly the same places).
See: https://gcc.gnu.org/onlinedocs/cpp/Undefining-and-Redefining-Macros.html:
These definitions are effectively the same:
#define FOUR (2 + 2)
#define FOUR (2 + 2)
#define FOUR (2 /* two */ + 2)
but these are not:
#define FOUR (2 + 2)
#define FOUR ( 2+2 )
#define FOUR (2 * 2)
#define FOUR(score,and,seven,years,ago) (2 + 2)
This change adjusts the existing "redefine-macro-legitimate" test to work with
the more strict understanding, and adds a new "redefine-whitespace" test to
verify that changes in the position of whitespace are flagged as errors.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch specifically fixes redefinition condition for white space
changes. #define and #undef functionality in GLSL follows the standard
for C++ preprocessors for macro definitions.
From https://gcc.gnu.org/onlinedocs/cpp/Undefining-and-Redefining-Macros.html:
These definitions are effectively the same:
#define FOUR (2 + 2)
#define FOUR (2 + 2)
#define FOUR (2 /* two */ + 2)
but these are not:
#define FOUR (2 + 2)
#define FOUR ( 2+2 )
#define FOUR (2 * 2)
#define FOUR(score,and,seven,years,ago) (2 + 2)
Fixes Khronos GLES3 CTS tests;
invalid_object_whitespace_vertex
invalid_object_whitespace_fragment
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
On nvc0, a counter can have up to 6 sources instead of only one
for nve4+. This fixes a crash when a counter uses more than
one source.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The set of variable uses does not need to be ordered in any way, and
removing/adding elements is a fairly common operation in various
optimization passes.
This shortens runtime of piglit test fp-long-alu to ~22s from ~4h
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
BRW_PREDICATE_ALIGN1_ANY16H was incorrectly being disassembled as
"all16h", and ALL16H would probably print as "(null)".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We clearly don't want to start at the head and walk backwards; we want
to start at the last real element before the tail sentinel. If the list
is empty, tail_pred will be the head sentinel, and we'll stop.
Nothing uses this function, so I guess nobody noticed it was broken.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This doesn't fix any known issue. In fact, radeon drivers ignore all
the discard flags for textures and implicitly do "discard range"
for any write transfer.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously instruction scheduling tracked dependencies on a per-register
basis. This meant that there was an artificial dependency between
interpolation instructions writing into the same virtual register.
Instruction scheduling would insert a number of instructions between the
two instructions in this example, when they are actually independent.
linterp vgrf8+0.0:F, hw_reg2:F, hw_reg3:F, hw_reg6:F
linterp vgrf8+1.0:F, hw_reg2:F, hw_reg3:F, hw_reg6+16:F
This lead to cases where the first texture coordinate is interpolated at
the beginning of the shader, but the second is done immediately before
the texture operation that uses it as a source.
After this change, the artificial dependency is removed and the
interpolation instructions are scheduled together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Revert "build: Build on Cygwin with gnu99 instead of c99." and define
_XOPEN_SOURCE appropriately.
This reverts commit 53e36d333c.
Since Cygwin 1.7.18 (April 2013), it's headers correctly prototype strtoll()
when using -std=c99, and correctly prototype strdup() when _XOPEN_SOURCE is
defined appropriately, so this workaround is no longer needed.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Cc: Vinson Lee <vlee@freedesktop.org>
Apparently TXD wants its offset differently than TEX, accepting it in
the upper bits of the layer index. Unclear what happens when this is
combined with indirect sampler indexing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Something about how we're implementing offsets for TXD is wrong, just
flip to the generic quadop-based implementation in that case.
This is the minimal fix appropriate for backporting.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Unfortunately there's no good way to do this on the nv50 shader isa.
Dropping the bias seems preferable to doing the compare post-filtering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Although the HSW PRM shows it, the BSpec lists this workaround as being
for Ivybridge only.
total instructions in shared programs: 1994951 -> 1993675 (-0.06%)
instructions in affected programs: 27325 -> 26049 (-4.67%)
Port of commit b16b3c87 to the vec4 code.
No shader-db improvements, but might as well. The fs backend saw an
improvement because it's scalar and multiple identical CMP instructions
were generated by the SEL peepholes.
[mattst88]: Modified to perform CSE on instructions with
the same writemask. Offered no improvement before.
total instructions in shared programs: 1995633 -> 1995185 (-0.02%)
instructions in affected programs: 14410 -> 13962 (-3.11%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
With a hack to place an exec_node in the struct in C to be at the same
location as the inherited exec_node in C++.
Acked-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
If another layout qualifier appeared to the left of `invocations` in the
GS input layout declaration, the invocation count would be dropped on
the floor.
Fixes the piglit tests:
spec/ARB_transform_feedback3/arb_transform_feedback3-ext_interleaved_two_bufs_gs_max
spec/ARB_gpu_shader5/arb_gpu_shader5-invocation-id
spec/ARB_gpu_shader5/compiler/correct-multiple-layout-qualifier-invocations.geom
spec/ARB_gpu_shader5/execution/invocations-conflicting
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The hardware allows multiple simultaneous renders with the same
memory-backed constbufs but with each invocation having different
values. However in order for that to work, the data has to be streamed
in via the right constbuf slot. We weren't doing that for UBOs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.1" <mesa-stable@lists.freedesktop.org>
Now that this cap is used to determine the availability of both, adjust
its name to reflect the new reality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The assumption is that any driver capable of emitting layer from the
vertex shader and supporting viewports should be able to also handle
emitting viewport index from the vertex shader.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Tobias Droste <tdroste@gmx.de>
The Linux winsys can no longer relocate shader code, so avoid
reemitting BindGBShader commands. They are costly.
v2: Correctly handle errors from SVGA3D_BindGBShader()
Reported-by: Michael Banack <banackm@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Previously, we were assuming that kernel metadata nodes only had 1 operand.
Kernels which have attributes can have more than 1, e.g.:
!0 = metadata !{void (i32 addrspace(1)*)* @testKernel, metadata !1}
!1 = metadata !{metadata !"work_group_size_hint", i32 4, i32 1, i32 1}
Attempting to get the kernel without the correct number of attributes led
to memory corruption and luxrays crashing out.
Fixes the cl/program/execute/attributes.cl piglit test.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76223
CC: "10.2" <mesa-stable@lists.freedesktop.org>
If XA fails to initialize with pipe_loader enabled, the pipe_loader's
cleanup function will close the drm file descriptor. That's pretty bad
because the file descriptor will probably be the X server driver's only
connection to drm. Temporarily solve this by dup()'ing the file descriptor
before handing it over to the pipe loader.
This fixes freedesktop.org bugzilla bug #80645.
v2: Fix CC addresses.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
If multiple viewports are supported, that implies the presence of a GS
and layered rendering, so we can enable ARB_fragment_layer_viewport as
well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I wanted to access this value from stage-generic code, so stop storing it
under two different names.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we had some NOS affecting VS compilation that resulted in optimization
changing the set of constants to be uploaded, we might not have reuploaded
the constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point, the extra copy of the data and memcmp are as expensive as
just re-uploading.
Note: now that we'll always upload, and brw_constant_buffer watches
BRW_NEW_BATCH anyway, we don't need to explicitly unref the old curbe_bo
at batch reset time.
No significant performance difference on glamor copywinwin10 (n=55),
despite that test having a 98% hit rate on the cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to support ARB_fragment_layer_viewport, we need to explicitly
send these along to the pixel shader, since it has no other way to
retrieve them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
With the inclusion of xmlconfig in the loader we're providing dri* symbols
which are already available in libdricommon.la. This leads to a build
break due to the multiple definitions.
Temporary allow multiple definitions, until we come with a better solution.
Reported-by: Laurent Carlier <lordheavym@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With all the hw drivers converted, we can go back to having
a single libdridrm provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
v2:
- Drop inclusion of the winsys wrapper and softpipe/llvmpipe.
- Remove old Makefile.am, target.c.
- Correctly append i915 to the megadrivers list.
Cc: Stephane Marchesin <stephane.marchesin@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Similiar to other targets, we'd like to convert all the separate
targets into a single one, thus we'll minimize the duplication and
overall size of mesa. The conversion per API basis, with the drivers
available either statically or shared. Currently the former is the
default.
v2: Correctly append the version script to the linker flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Will be used to create the single dri target library, on our
way to convert all the dri targets during the conversion to
to static/shared pipe-drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
With this commit we add a couple of DEFINES making the ST code
conditional, in a way that we can use it to gradually convert
the dri-targets from separate libraries into a single one.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Some of the features are completely implemented by core, while others
have hardware dependencies. Create a list of drivers supporting each
sub-feature that must have hw support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because the layout is always linear this didn't really do much any longer -
at some point this triggered per-tile swizzled->linear conversion. The x/y
coords were ignored too.
Apart from triggering conversion, this also invoked alloc_image_data(), which
could only actually trigger mapping of display target resources. So, instead
just call resource_map in the callers (which also gives the ability to unmap
again). Note that mapping/unmapping of display target resources still isn't
really all that clean (map/unmap may be unmatched, and all such mappings use
the same pointer thus usage flags are a lie).
Reviewed-by: Brian Paul <brianp@vmware.com>
The only caller left used it only for non display target textures,
hence it was really the same as llvmpipe_get_texture_image_address - it
also had a usage flag but this was ignored anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Once used for invoking swizzled->linear conversion for all needed images.
But we now have a single allocation for all images in a resource, thus looping
through all slices is rather pointless, conversion doesn't happen neither.
Also simplify the sampling setup code to use the mip_offsets array in the
resource directly - if the (non display target) resource exists its memory
will already be allocated as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
The deferred allocation doesn't really make much sense anymore, since we no
longer allocate swizzled/linear memory in chunks and not per level / slice
neither.
This means we could fail resource creation a bit more (could already fail in
theory anyway) but should not fail maps later (right now, callers can't deal
with neither really).
Reviewed-by: Brian Paul <brianp@vmware.com>
it looks since ce1a137228 they are now included
in more places, in particular even for things buildable with msvc, and hence
those break the build.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Drop stdbool, due to the X server being a pain and having
struct members called bool, although I've sent a patch to fix
that we should retain stupidity here. Use unsigned char
which is what GLboolean is anyways.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 17c7ead7 exposed a bug in how uniform loading happens in the
presence of discard. It manifested itself in an application as
randomly incorrect pixels on the borders of conditional areas.
This is due to how discards jump to the end of the shader incorrectly
for some channels. The current implementation checks each 2x2
subspan to preserve derivatives. When uniform loading via samplers
was turned on, it uses a full execution mask, as stated in
lower_uniform_pull_constant_loads(), and only populates four channels
of the destination (see generate_uniform_pull_constant_load_gen7()).
It happens incorrectly when the first subspan has been jumped over.
The series that implemented this optimization was done before the
changes to use samplers for uniform loads. Uniform sampler loads
use special execution masks and only populate four channels, so we
can't jump over those or corruption ensues.
This fix only jumps to the end of the shader if all relevant channels
are disabled, i.e. all 8 or 16, depending on dispatch. This
preserves the original GLbenchmark 2.7 speedup noted in commit
beafced2.
It changes the shader assembly accordingly:
before : (-f0.1.any4h) halt(8) 17 2 null { align1 WE_all 1Q };
after(8) : (-f0.1.any8h) halt(8) 17 2 null { align1 WE_all 1Q };
after(16): (-f0.1.any16h) halt(16) 17 2 null { align1 WE_all 1H };
v2: Cleaned up comments and conditional ordering.
v3: Fix typo.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Reviewed-by: Mike Stroyan <mike@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79948
cfg, for instance, is a pointer to a local variable in
calculate_live_intervals, certainly not valid after that function has
returned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We want to have the dri common files compiled to define USE_DRICONF.
We need to check both NEED_OPENGL_COMMON and HAVE_DRICOMMON
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Tested-by: Brian Paul <brianp@vmware.com>
UniformBufferSize is in bytes so we need to divide by 16 to get the
number of constant buffer slots. Also, the ureg_DECL_constant2D()
function takes first..last parameters so we need to subtract one
for the last value.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Before, we were always using the address register and indirect addressing
to index into a UBO constant buffer. With this change we only do that
when necessary.
Using the piglit bin/arb_uniform_buffer_object-rendering test as an
example:
Shader code:
uniform ub_rot {float rotation; };
...
m[1][1] = cos(rotation);
Before:
IMM[1] INT32 {0, 1, 0, 0}
1: UARL ADDR[0].x, IMM[1].xxxx
2: MOV TEMP[0].x, CONST[3][ADDR[0].x].xxxx
3: COS TEMP[1].x, TEMP[0].xxxx
After:
0: COS TEMP[0].x, CONST[3][0].xxxx
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Otherwise, if we were creating a const buffer src register for a UBO
the index into the UBO was always zero.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To implement the unlit_centroid_workaround, previously we emitted
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q };
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q };
where the flag register contains the channel enable bits from g0.
Since the predicates are complementary, the pair of pln instructions
write to non-overlapping components of the destination, which is the
case that the dependency control hints are designed for.
Typically setting dependency control hints on predicated instructions
isn't safe (if an instruction doesn't execute due to the predicate, it
won't update the scoreboard, leaving it in a bad state) but since we
must have at least one channel executing (i.e., +f0 is true for some
channel) by virtue of the fact that the thread is running, we can put
the +f0 pln instruction last and set the hints:
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q };
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q };
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This sequence (where both x and w are used afterwards) wasn't handled.
mul.sat x, y, z
...
mov.sat w, x
We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.
So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.
total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs: 155248 -> 154568 (-0.44%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4231165 123200 39648 4394013 430c1d i965_dri.so
4186277 123200 39648 4349125 425cc5 i965_dri.so
Cuts 43k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4244821 123200 39648 4407669 434175 i965_dri.so
4231165 123200 39648 4394013 430c1d i965_dri.so
Cuts 13k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
text data bss dec hex filename
4270747 123200 39648 4433595 43a6bb i965_dri.so
4244821 123200 39648 4407669 434175 i965_dri.so
Cuts 25k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This improves GLX DRI3 GPU offloading significantly on CPU
bound benchmarks particularly.
No performance impact for DRI2 GPU offloading.
v2: Add missing tests
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Marek Olšák<marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The differences with DRI2 GPU offloading are:
a) There's no logic for GPU offloading needed in the Xserver
b) for DRI2, the card would render to a back buffer, and
the content would be copied to the front buffer (the same buffers
everytime). Here we can potentially use several back buffers and copy
to buffers with no tiling to share with X. We send them with the
Present extension.
That means than the DRI2 solution is forced to have tearings with GPU
offloading. In the ideal scenario, this DRI3 solution doesn't have this
problem.
However without dma-buf fences, a race can appear (if the card is slow
and the rendering hasn't finished before the server card reads the buffer),
and then old content is displayed. If a user hits this, he should probably
revert to the DRI2 solution (LIBGL_DRI3_DISABLE). Users with cards fast
enough seem to not hit this in practice (I have an Amd hd 7730m, and I
don't hit this, except if I force a low dpm mode)
c) for non-fullscreen apps, the DRI2 GPU offloading solution requires
compositing. This DRI3 solution doesn't have this requirement. Rendering
to a pixmap also works.
d) There is no need to have a DDX loaded for the secondary card.
V4: Fixes some piglit tests
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRI_PRIME is not very handy, because you have to launch the executable
with it set, which is not always easy to do.
By using drirc, the user specifies the target executable
and the device to use. After that the program will be launched everytime
on the target device.
For example if .drirc contains:
<driconf>
<device driver="loader">
<application name="Glmark2" executable="glmark2">
<option name="device_id" value="pci-0000_01_00_0" />
</application>
</device>
</driconf>
Then glmark2 will use if possible the render-node of
ID_PATH_TAG pci-0000_01_00_0.
v2: Fix compilation issue
v3: Add "-lm" and rebase.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
v2: Fix the leak of device_name
v3: Rebased
It enables to use the DRI_PRIME env var to specify
which gpu to use.
Two syntax are supported:
If DRI_PRIME is 1 it means: take any other gpu than the default one.
If DRI_PRIME is the ID_PATH_TAG of a device: choose this device if
possible.
The ID_PATH_TAG is a tag filled by udev.
You can check it with 'udevadm info' on the device node.
For example it can be "pci-0000_01_00_0".
Render-nodes need to be enabled to choose another gpu,
and they need to have the ID_PATH_TAG advertised.
It is possible for not very recent udev that the tag
is not advertised for render-nodes, then
ones need to add a file containing:
SUBSYSTEM=="drm", IMPORT{builtin}="path_id"
in /etc/udev/rules.d/
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This in theory changes ABI for the boolean->bool I think,
but nothing in the tree uses configQueryb AFAICS.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just drops all the GL types from the xmlconfig and use
std C types from stdint and stdbool.
v2: drop further double and header include.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Update all three build systems, and add freedreno to the android
build. Pending future work on the ST we can convert egl-static
to provide either static or dynamic access to the pipe-drivers.
There is no functional change with this patch.
v2: Don't add freedreno to android build, drop the wrapper winsys.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Move the gbm "target" code to the state-tracker, similar
to other - dri, omx, vdpau... ST.
v2: Drop inclusion of the wrapper winsys and softpipe/llvmpipe.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Now we can build the xa target (libxatracker) with either static
pipe-drivers or shared ones. Currently we default to static.
- Remove the unused CFLAGS/CPPFLAGS.
- Use GALLIUM_TARGET_CFLAGS where applicable.
v2: Update the printout messages at configure.
v3: Drop inclusion of the wrapper winsys and softpipe/llvmpipe.
Cc: Jakob Bornecrantz <jakob@vmware.com>
Cc: Rob Clark <robclark@freedesktop.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
At this point, brw_disassemble can do everything gen8_disassemble can
do - and, thanks to the new brw_inst API, it supports all generations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, we decoded render target write messages as:
render ( RT write, 0, 16, 12, 0) mlen 8 rlen 0
which made you remember (or look up) what the numbers meant:
1. The binding table index
2. The raw message control, undecoded:
- Last Render Target Select
- Slot Group Select
- Message Type (SIMD8, normal SIMD16, SIMD16 replicate data, ...)
3. The dataport message type, again (already decoded as "RT write")
4. The write commit bit (0 or 1)
Needless to say, having to decipher that yourself is annoying. Now, we
do:
render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0
with optional "Hi" and "WriteCommit" for slot group/write commit.
Thanks to the new brw_inst API, we can also stop duplicating code on a
per-generation basis.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We haven't used the name "message target" in a while - there are a lot
of things called "target", and it gets confusing. SFID ("Shared
Function ID") is the term commonly used in the modern documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The name of this message is "Render Target UNORM Write" (Sandybridge
PRM, Volume 4 Part 1, Page 210). Drop the bogus 'c'.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Most developers will recognize the Gen6+ SFID names more quickly than
the Gen4-5 ones. Given that they're the same values, just use the new
names.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We should print something properly, but I'm not sure how to properly
print an HF, and we don't have any DFs today to test with.
This is at least better than the current Gen8 disassembler, which would
simply assert fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This backports the atomic message disassembly support from
gen8_disasm.c, which additionally offers support for decoding atomic
surface read/write messages, and showing SIMD modes and other details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I never bothered implementing the disassembler for Gen7+ URB opcodes, so
we were just disassembling them as Ironlake/Sandybridge ones. This
looked pretty bad when running Paul's GS EndPrimitive tests, as the
"write OWord" message was decoded at ff_sync, which doesn't exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
While we're adding things, use symbolic constants rather than magic
numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These have existed since Ivybridge. We don't use them today, but the
Gen8+ disassembler supports them, and I'd like to use symbolic names
rather than magic numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes brw_disasm.c able to disassemble ELSE instructions correctly
on Broadwell. (gen8_disasm.c already handles this correctly.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, our dissasembly for flow control instructions looked like:
0x00000040: else(8) ip 65540D { align16 switch };
It didn't print InstCount properly for ELSE/ENDIF, and didn't even
attempt to disassemble PopCount.
Now it looks like:
0x00000040: else(8) Jump: 4 Pop: 1 { align16 switch };
which is much more readable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Previously, flow control instructions generated output like:
(+f0) if(8) 12 8 null 0x000c0008UD { align16 WE_normal 1Q };
which included a dissasembly of the register fields, even though those
are meaningless for flow control instructions---those bits are reused
for another purpose.
It also wasn't immediately obvious which number was UIP and which was
JIP.
With this patch, we instead output:
(+f0) if(8) JIP: 8 UIP: 12 { align16 WE_normal 1Q };
which is much clearer.
The patch also introduces has_uip/has_jip helper functions which clear
up a some generation/opcode checking mess.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
While we're at it, use proper names rather than magic numbers for the
existing fields.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
brw_disasm.c basically wasn't following the Mesa coding style at all.
It used 4-space indent instead of 3-space, didn't cuddle braces, didn't
put function return types on a separate line, put extra spaces in
function calls (between the name and parenthesis), and a number of other
things.
This made it fairly obnoxious to work on, since my editor is configured
to follow Mesa style in the Mesa source repository. Fixing it to follow
a consistent style now should save time dealing with it later.
These modifications were originally generated by:
$ indent -br -i3 -npcs -ce -cs -l80 --no-tabs
with some manual changes afterwards to fit our style better.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
opcode is just a pointer to opcode_descs; we may as well use that
directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
As far as I can tell, the Intel mesa driver is the only driver in the world
still supporting this legacy extension. If someone wants to do bump
mapping, they can use shaders.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
Reviewed-by: Ian Romanick <idr@freedesktop.org> [v3]
On i965, enabling and disabling the GS is not free: you have to do a
full pipeline stall, reconfigure the URB and push constant space, and
emit a bunch of state. Most clears aren't layered, so the GS isn't
needed in the common case. But we turned it on universally.
Using AMD_vertex_shader_layer allows us to skip setting up the GS
altogether, while achieving the same effect.
According to Ilia, current nVidia GPUs can't do AMD_vertex_shader_layer.
However, since nouveau is Gallium-based, they're unlikely to ever care
about this path. Intel and AMD GPUs both support the extension.
Since i965 is the only driver using this path which does layered
rendering, we may as well target it at that.
v2: Improve commit message. No code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It should be possible to query the number of primitives written to each
individual stream by a geometry shader in a single draw call. For that
we need to have up to MAX_VERTEX_STREAM separate query objects.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
So far we have been using CL_INVOCATION_COUNT to resolve this query but this
is no good with streams, as only stream 0 reaches the clipping stage. Instead
we will use SO_PRIM_STORAGE_NEEDED which can keep track of the primitives sent
to each individual stream.
Since SO_PRIM_STORAGE_NEEDED is related to the SOL stage and according to
ARB_transform_feedback3 we need to be able to query primitives generated in
each stream whether transform feedback is active or not what we do is to
enable the SOL unit even if transform feedback is not active but disable all
output buffers in that case. This effectively disables transform feedback
but permits activation of statistics enabling SO_PRIM_STORAGE_NEEDED even
when transform feedback is not active.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Check if non-zero streams are used. Fail to link if emitting to unsupported
streams or emitting to non-zero streams with output type other than GL_POINTS.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be necessary to implement EndStreamPrimitive().
EndPrimitive() will produce an ir_end_primitive with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be necessary to implement EmitStreamVertex().
EmitVertex() will produce an ir_emit_vertex with the default stream 0.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If the geometry shader is indeed using streams then we need 2 control data
bits per vertex for the StreamID. If the shader is not using streams then
we don't need control data bits.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On Intel hardware when a geometry shader outputs GL_POINTS primitives we
only need to emit vertex control bits if it emits vertices to non-zero
streams, so use a flag to track this.
This flag will be set to TRUE when a geometry shader calls EmitStreamVertex()
or EndStreamPrimitive() with a non-zero stream parameter in a later patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Outputs that are linked to inputs in the next stage must be output to stream 0,
otherwise we should fail to link.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Configure hardware to read vertex data for all streams and have all streams
write their varyings to the corresponsing output buffers.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This implements parsing requirements for multi-stream support in
geometry shaders as defined in ARB_gpu_shader5.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes commit a001ca98e15(st/omx: keep the name,
(name|role)_specific strings dynamically allocated) in which we
dynamically allocated the buffers for name and (name|role)_specific
yet forgot to copy the encoder strings into them.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80614
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All of the bits appear to already be in place to support this in the
sampler (which the original AMD version didn't allow).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't change anything to the intel DRI3 implementation,
but enables the gallium implementation to use dri2.stamp instead
of relying on the stamp shared with the st backend.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Move the image extension setup in with all the others in
bind_extensions, and improve the check to both version
and function pointer.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While the official INTEL_swap_event specification says that the drawable
field should contain the GLXDrawable, not the Drawable, the existing
DRI2 code in dri2.c that translates from DRI2_BufferSwapComplete sends out
GLX_BufferSwapComplete with the Drawable's ID, so existing codebases
like Clutter/Cogl rely on getting the Drawable.
Match DRI2's error here and stuff the event with the X Drawable, not
the GLX drawable.
This fixes apps seeing wrong drawables through an indirect GLX context
or with DRI3, which uses the GLX_BufferSwapComplete event directly on
the wire instead of translates Present in mesa.
At the same time, also modify the structure for the event to make sure
that clients don't make the same mistake. This is not an API or ABI
break, as GLXDrawable and Drawable are both typedefs for XID.
Signed-off-by: Jasper St. Pierre <jstpierre@mecheye.net>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
MCS buffers are never allocated on Broadwell, so this does nothing for
now, but puts the infrastructure in place for when they do exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
According to the documentation, we don't need this SINT workaround on
Broadwell. (Or at least, it doesn't mention that we need it.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Broadwell generalizes the MCS fields to allow for multiple kinds of
auxiliary surfaces. This patch adds the plumbing to set those values,
but doesn't yet hook any up.
v2: (by Jordan Justen) Use mt for qpitch; pitch is tiles - 1.
v3: Don't forget to subtract 1 from aux_mt->pitch.
v4: Drop unnecessary aux_mt->offset (caught by Jordan Justen).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Prior to the new brw_inst API, the brw_instruction structure split off
bits 4 and 5 of msg_control for specific fields, and we failed to
disassemble them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
They don't have a UIP. We used UIP in an array dereference, which never
caused problems on Gen < 8, since UIP was a small integer (number of
instructions). On Gen 8 UIP is in bytes, so it's large enough that it
caused us to read out of bounds of the array.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now nothing uses this, but we can incrementally convert.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use this an an opportunity to clean up the formatting of some old code
(brw_ADD, for instance).
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: (by Kenneth Graunke)
- Fix disassembly of Gen4-5 SEND messages to print base MRF correctly.
- Only print URB opcode on Gen5+, to match previous output (besides,
there is only one opcode AFAICT.)
- Only print the low 3 bits of msg_control, to match previous output.
(We probably should decode all the fields, but hadn't previously due
to the brw_instruction structure definition splitting out bits 4/5
for last_render_target and slot_group_select.)
- Fix 3-source MRF/GRF file decoding on Sandybridge.
- Fix compression code to use qtr_control rather than cmpt_control
(which is compaction, not compression).
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use brw_inst_bits rather than pulling out individual fields and
reassembling them.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're going to use fs_generator/vec4_generator for Gen8+ code soon,
thanks to the new brw_instruction API. When we do, we'll generally
want to take the Haswell paths on Gen8+ as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2:
- Fix IF -> ELSE patching on Sandybridge.
- Don't set base_mrf on Gen6+ in OWord Block Read functions. (Although
- the old code did this universally, it shouldn't have - the field
- doesn't exist on Gen6+ and just got overwritten by the SFID anyway.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Don't set flag_reg_nr prior to Gen7 (as it doesn't exist).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is similar to gen8_instruction, and will eventually replace it.
For now nothing uses this, but we can incrementally convert.
The new API takes the existing brw_instruction pointers to ease
conversion; when done, we can simply drop the old structure and rename
struct brw_instruction -> brw_inst.
v2: (by Matt Turner) Make JIP/UIP functions take a signed argument.
v3: (by Kenneth Graunke)
- Make Gen4-6 jump target functions take a signed argument.
- Fix indirect align1 AddrImm bits on Gen4-7.
- Fix SFID on Sandybridge to use bits 27:24.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v1, v3+]
Signed-off-by: Matt Turner <mattst88@gmail.com> [v2]
Reviewed-by: Matt Turner <mattst88@gmail.com>
The new brw_inst API is going to require a brw pointer in order
to access fields (so it can do generation checks). Plumb it in now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the workarounds list, these stalls aren't needed on
production Baytrail systems. Piglit confirms that as well.
These cause a small slowdown when we are sending a large number of small
batches to the GPU. Removing these improves performance by up to 5% on
some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom,
Multithread, ShMapPcf).
Signed-off-by: Gregory Hunt <greg.hunt@mobica.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Intel would like us to include the marketing names. Developers
additionally want "Broadwell GT1/2/3" because it makes it easier
to identify what hardware users have when they request assistance
or report issues.
Including both makes it easy for everyone to map between the names.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The flags are not specific to the video targets plus
we can reuse them for targets/xa and targets/gbm.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Required for the conversion stage of all VL targets to
a single library per API (static/shared pipe-drivers).
No longer required as per last commit.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The radeonsi counterpart of previous commit - now libomx-radeonsi is
built into the libomx-mesa library. Providing a single library per API.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
This patch concludes the unification. Now libomx-mesa will be used
for all hardware - r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The r600 counterpart of previous commit - now the libomx-r600 is
built into the libomx-mesa library. Providing a single library per API.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Similar to the vdpau/xvmc targets, we're going to convert the
multiple target libraries into a single one.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Strictly speaking we should not have done this in the
first place, as all of the above should be static across
the system.
Currently this may cause some minor issues, which will be
resolved in the following patches, by providing a single
library for the OMX api.
Cleanup a few unneeded strcpy cases while we're around.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio
If you have more than one omx library (libomx-radeonsi, libomx-r600),
make sure to temporary move the unused one. By the end of the series
there will be only one library that will be used for all hardware -
r600, radeonsi and nouveau.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The number of components and their names/roles should
be kept constant as all of that information cached.
Note: Make sure to rebuild the .omxregister file, by executing
$ omxregister-bellagio.
Cc: Leo Liu <leo.liu@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Check calloc return value and report on error, also later skip
results handling if there was no memory to store results to.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This only makes any sense on the GS input or output layout declaration,
nowhere else.
Fixes the piglit tests:
* spec/glsl-1.50/compiler/incorrect-in-layout-qualifiers-with-variable-declarations.geom
* spec/glsl-1.50/compiler/incorrect-out-layout-qualifiers-with-variable-declarations.geom
* spec/glsl-1.50/compiler/layout-fs-no-output.frag
* spec/glsl-1.50/compiler/layout-vs-no-input.vert
* spec/glsl-1.50/compiler/layout-vs-no-output.vert
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we disallowed any combination of layout with interpolation,
invariant, or precise qualifiers. There is very little spec guidance on
exactly which combinations should be allowed, but with ARB_sso it's
useful to allow these qualifiers with rendezvous-by-location.
Since it's unclear exactly where the layout qualifier should appear when
combined with other qualifiers, we will allow it anywhere before the
auxiliary storage qualifier.
This allows enough flexibility for all examples I've seen, while keeping
the auxiliary-storage-qualifier / storage-qualifier pair together (as
they are a single qualifier in the spec prior to
ARB_shading_language_420pack)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa has an optimization that converts expressions like "v.x + v.y + v.z
+ v.w" into dot(v, 1.0). And therein lies the rub: the other operand to
the dot-product is always a float... even if the vector is an ivec or
uvec. This results in an assertion failure in ir_builder.
If the base type of the operand is not float, don't try the
optimization. Dot-product is not valid on integer data.
Fixes piglit vs-integer-reduction.shader_test and OpenGL ES conformance
test ES2-CTS.gtf.GL2Tests.glGetUniform.glGetUniform.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
The emit_math?_gen? functions serve to implement workarounds for the
math instruction, none of which exist on Gen8+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For the first use of a buffer, we will only need the temporary
resource in the case that a user wants to write/map to this buffer.
But in the cases where the user creates a buffer to act as an
output of a kernel, then we were creating an unneeded resource,
because it will contain garbage, and would be copied to the pool,
and destroyed when promoting.
This patch avoids the creation and copies of resources in
this case.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The important part is the change of the condition to <= 0. Otherwise the loop
gets stuck never actually growing the pool.
The change in the aux-need calculation guarantees max 2 iterations, and
avoids wasting memory in case a smaller item can't fit into a relatively larger
pool.
Reviewed-by: Bruno Jiménez <brunojimen@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
The dst pointer needs to be initialized after any calls to
compute_memory_grow_pool, as the function might change the pool->vbo pointer.
This fixes crashes and assertion failures in two gegl tests.
Reviewed-by: Bruno Jiménez <brunojimen@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
nouveau screens are reused for the same device node. However in the
scenario where we create screen 1, screen 2, and then delete screen 1,
the surrounding code might also close the original device node. To
protect against this, dup the fd and use the dup'd fd in the
nouveau_device. Also tell the nouveau_device that it is the owner of the
fd so that it will be closed on destruction.
Also make sure to free the nouveau_device in case of any failure.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79823
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@ubuntu.com>
Previously, if we had something like:
gl_ViewportIndex = idx;
for(int i = 0; i < gl_in.length(); i++) {
gl_Position = gl_in[i].gl_Position;
EmitVertex();
}
EndPrimitive();
The right viewport index would not be set on the primitive because the
last vertex is the provoking one. However blob drivers appear to move
the gl_ViewportIndex write into the for loop, allowing the application
to be ignorant of this detail.
While the application is technically wrong here, because the blob does
it and other drivers appear to implicitly work this way as well, we add
a buffer register that viewport index writes go into, which is then
exported before every EmitVertex() call.
This fixes the remaining piglit tests in ARB_viewport_array for nv50/nvc0.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The old logic would let all negative values go through unclamped, with
potentially disastrous results (probably trying to fetch viewport values
from random memory locations). GL has undefined rendering for vp indices
outside valid range but that's a bit too undefined...
(The logic is now the same as in llvmpipe.)
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
As far as I can tell, Broadwell doesn't need any of the SURFACE_STATE
workarounds for textureGather() bugs, so there's no need to emit
a second set of identical copies.
To keep things simple, just point the gather surface index base to the
same place as the texture surface index base.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
With commit 11e46a32ae and f9ebb1ea77 we resolved the symlink
generation required by the versioning of the library.
Although they incorrectly changed the way hardlinks are created by
linking to the ones from the build tree. If the device used for
building differs from the one set as destination linking will fail.
Reported-by: Andy Furniss <adf.lists@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the blorp blitter would only be used if the format is identical or
there is only a difference between whether there is an alpha component or not.
This patch makes it also allow the blorp blitter if the only difference is the
ordering of the RGB components (ie, RGB or BGR).
This is particularly useful since commit 61e264f4fc because Mesa now
prefers RGB ordering for textures but the window system buffers are still
created as BGR. That means that the blorp blitter won't be used for the
(probably) common case of blitting from a texture to the window system buffer.
This doesn't cause any regressions in the FBO piglit tests on Haswell. On
Sandybridge it causes the fbo-blit-stretch test to fail but that is only
because it was failing anyway before the above commit and that commit hid the
problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68365
Reviewed-by: Matt Turner <mattst88@gmail.com>
In file included from ../../src/glsl/builtin_functions.cpp:61:0:
../../src/glsl/glsl_parser_extras.h:154:9: warning: unused parameter 'var' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fix an off by one in the texture unit walk during texblend
setup on gen2. This caused the last enabled texunit to be
skipped resulting in totally messed up texturing.
This is a regression introduced here:
commit 1ad443ecdd
Author: Eric Anholt <eric@anholt.net>
Date: Wed Apr 23 15:35:27 2014 -0700
i915: Redo texture unit walking on i830.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The r600 equivalent of previous commit.
v2: Correctly include the radeon winsys/radeon_common.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to vdpau targets, we're going to convert the individual
target libraries into a single one.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to previous commits, this allows us to minimise some
of the duplication by compacting all vdpau targets into a
single library.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Similar to previous commit, this allows us to minimise some
of the duplication by compacting all vdpau targets into a
single library.
v2: Include the radeon winsys only when there is a user for it.
v3: Correcly include the winsys. Now with extra brown bag :\
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Create a single library (for the vdpau api) thus reducing
the overall size of mesa. Current commit converts
vdpau-nouveau, with upcomming commits handling the rest.
The library can be built with the relevant pipe-drivers
statically linked in, or loaded as shared modules.
Currently we default to static.
Add SPLIT_TARGETS to guard the other VL targets.
Note: symlink handling is rather ugly and will need an
update to work with BSD and other non-linux platforms.
v2: Split the conversion into per-target basis.
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Thomas Helland <thomashelland90 at gmail.com>
Blob driver seems to need WFI in some cases after CP_EVENT_WRITE,
implying that this is asynchronous and should reset needs_wfi.
Also, CP_INVALIDATE_STATE seems to need WFI. But CP_LOAD_STATE
does not.
The blob driver also puts WFIs before writing GRAS_CL_VPORT registers.
The latter may be a work-around, as these registers should be banked/
context registers. I haven't yet found a lockup that this averts, but
I expect viewport to change infrequently so out of paranoia I will
keep these for now.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The spec doesn't actually mention adding this, but this is the usual
pattern so I'm assuming it's a spec bug.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is purely GLSL -- there are no new GL API elements.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When the last context in a share group is destroyed, the hash table
containing all of the shader programs (ctx->Shared->ShaderObjects) is
destroyed, throwing away all of the shader programs.
Using a static variable to store program IDs ends up holding on to them
after this, so we think we still have a compiled program, when it
actually got destroyed. _mesa_UseProgram then hits GL errors, since no
program by that ID exists.
Instead, store the program IDs in the context, so we know to recompile
if our context gets destroyed and the application creates another one.
Fixes es3conform tests when run without -minfmt (where it creates
separate contexts for testing each visual).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77865
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Prior to GLX 1.3 there was the glxMakeCurrent() function that took a
single drawable handle. The Drawable could be either a bare XID for a
Window or an XID for a glxpixmap.
GLX 1.3 added glxMakeContextCurrent that takes 2 handles: one for
reading, one for writing. Nowadays the old glxMakeCurrent call is
implemented as a call to glxMakeContextCurrent with the single handle
duplicated.
Because of this it is allowed to use a plain-old Window ID as an
argument to glxMakeContextCurrent, although nobody really documents this
sort of thing. The manpage for the NEW call specifies the arguments as
GLXPixmaps, but the actual code accepts Window XIDs too, and handles
them correctly.
Similarly, the glxSelectEvents function can also take a bare Window XID.
The "piglit" tests all use GLXWindows and/or GLXPixmaps. You never
tested swap events with a bare Window XID. That is what my app was
doing.
The swap_events code worked with Window XIDs in mesa 7.x.y. The new code
added in versions 8, 9, and 10 assumes that all buffer swap events have
a GLXPixmap associated with them. Because of the historical quirks
above, this is not true. Swap events for bare Window XIDs do NOT have a
glxpixmap resulting in a segfault.
Any app that uses the old school glxMakeCurrent call with a Window XID
while trying to use swap_events will crash when the libs try to lookup
the nonexistent GLXPixmap associated with the incoming swap event.
I believe that the people who wrote the spec overlooked this, because
the "sbc" field comes from the OML_sync extension that is defined in
terms of glxpixmaps only.
v2 (idr): Formatting changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
With this we can assure that mapped buffers will never change
its position when relocating the pool.
This patch should finally solve the mapping bug.
v2: Use the new is_item_in_pool util function,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This function will be used when we want to map an item
that it's already in the pool.
v2: Use temporary variables to avoid so many castings in functions,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Acording to the OpenCL spec, it is possible to have a buffer mapped
for reading and at read from it using commands or buffers.
With this we can keep the mapping (that exists against the
temporary item) and read with a kernel (from the item we have
just added to the pool) without problems.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now we will have a list with the items that are in the pool
(item_list) and the items that are outside it (unallocated_list)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
These statuses will help track whether the items are mapped
or if they should be promoted to or demoted from the pool
v2: Use the new is_item_in_pool util function,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch changes completely the way buffers are added to the
compute_memory_pool. Before this, whenever we were going to
map a buffer or write to or read from it, it would get placed
into the pool. Now, every unallocated buffer has its own
r600_resource until it is allocated in the pool.
NOTE: This patch also increase the GPU memory usage at the moment
of putting every buffer in it's place. More or less, the memory
usage is ~2x(sum of every buffer size)
v2: Cleanup
v3: Use temporary variables to avoid so many castings in functions,
as suggested by Tom Stellard
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The intention of this pass was to give us better instruction scheduling
opportunities, but it unexpectedly reduced some instruction counts as
well:
total instructions in shared programs: 1666639 -> 1666073 (-0.03%)
instructions in affected programs: 54612 -> 54046 (-1.04%)
(and trades 4 SIMD16 programs in SS3)
Previously llvm detected cpu features automatically when the execution engine
was created (based on host cpu). This is no longer the case, which meant llvm
was then not able to emit some of the intrinsics we used as we didn't specify
any sse attributes (only on avx supporting systems this was not a problem since
despite at least some llvm versions enabling it anyway we always set this
manually). So, instead of trying to figure out which MAttrs to set just set
MCPU.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=77493.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
An LLVMContext should only be accessed by a single and using the global
context was causing crashes in multi-threaded environments. Now we use
a separate context for each compile.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Will be used to control the linking mode of pipe-drivers
in gallium targets.
Keep this hardcoded to static, as the pipe-drivers bare
an unstable interface which we do not want to expose to
the normal user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- gallium_pipe_loader_winsys_libs
Will be used in upcomming commits to reduce duplication
in the build.
v2: Drop the megadriver/static_target variables.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add a couple of helpers to be used by the dri targets when
built with static pipe-drivers. Both functions provide
functionality required by the dri state-tracker.
With this patch ilo, nouveau and r300 gain support for
throttle dri configuration.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will be used by gallium targets that statically link the
pipe-drivers in the final library. Provides identical
functionality to device_descriptor.create_screan.
v2:
- Don't sw_screen_wrap the i915/svga screen.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If memory serves me right, at least one debug wrapper does
not return the base screen on failure.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Required for the dri state-tracker. Will be used to retrieve
driver specific configuration parameters:
- share_fd (dmabuf) capability
- throttle
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This is just a hack, it should be possible to create a temporary zeta
surface and render to that instead. However that's more complicated and
this avoids the render being entirely broken and errors being reported
by the card.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Commit ad4dc772 fixed an issue with the viewport not being restored
correctly. However it's rather hackish and confusing. Instead just mark
the viewport dirty and let the viewport validation take care of it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The extension is always supported if GLSL 1.30 is supported.
Softpipe and llvmpipe support is also added (trivial).
Radeon and nouveau support is already done.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It was set to pipe_resource::last_level and _MaxLevel was embedded in max_lod,
that's why it worked for ordinary texturing. However, min_lod doesn't have
any effect on texelFetch and textureQueryLevels, so we must still set
last_level correctly.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Such conversions (which are most likely rather pointless in practice) were
resulting in shifts with negative shift counts and shifts with counts the same
as the bit width. This was always undefined in llvm, the code generated was
rather horrendous but happened to work.
So make sure such shifts are filtered out and replaced with something that
works (the generated code is still just as horrendous as before).
This fixes lp_test_format, https://bugs.freedesktop.org/show_bug.cgi?id=73846.
v2: prettify by using build context shift helpers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
A drawable size of 0x0 means that we don't have buffers for a drawable yet,
not that we have a zero-sized buffer. Core mesa shouldn't be optimizing out
drawing based on buffer size, since the draw call could be what triggers
the driver to go and get buffers. As discussed in the referenced bug report,
the optimization was added as part of a scatter-shot attempt to fix a
different problem. There's no other example in mesa core of using the
buffer size in this way.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74005
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Type mismatch caused random memory to be copied when casted
memory area was smaller than expected type.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
because no matching fbConfigs or visuals could be found.
Nearly all the error cases in *createScreen() issue an error message to diagnose
the failure to initialize before branching to handle_error. The few remaining
error cases which don't should probably do the same.
(At the moment, it seems this can be triggered in drisw with an X server which
reports definite values for MAX_PBUFFFER_(WIDTH|HEIGHT|SIZE), because those
attributes are checked for an exact match against 0.)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They have to be marked as structs for C code elsewhere. bblock_t is
already defined as a struct, and all of backend_instruction's fields are
public anyway.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since CSE creates instructions, if we let CSE generate things register
coalescing can't remove, bad things will happen. Only let CSE combine
non-copy load_payloads.
E.g., allow CSE to handle this
load_payload vgrf4+0, vgrf5, vgrf6
but not this
load_payload vgrf4+0, vgrf5+0, vgrf5+1
Patch adds a type check between switch init-expression and case label
and performs a implicit signed->unsigned type conversion when possible.
v2: add GLSL spec reference, do implicit conversion if possible (Matt)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79724
Reviewed-by: Matt Turner <mattst88@gmail.com>
Like on Haswell, we need to use 8x4 aligned rectangle primitives for
hierarchical depth buffer resolves and depth clears. See the comments
in brw_blorp.cpp's brw_hiz_op_params() constructor. (The Broadwell
documentation confirms that this is still necessary.)
This patch makes the Broadwell code follow the same behavior as Chad and
Jordan's Gen7 BLORP code. Based on a patch by Topi Pohjolainen.
This fixes es3conform's framebuffer_blit_functionality_scissor_blit
test, with no Piglit regressions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
We only enable HiZ for miplevels which are aligned on 8x4 blocks. When
debugging HiZ failures, it's useful to know whether a particular
miplevel is using HiZ or not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
flags.q.local_size has 3 bits. One each for x, y and z.
Fixes piglit's:
* spec/ARB_compute_shader/linker/mismatched_local_work_sizes
* spec/ARB_compute_shader/compiler/default_local_size.comp
* spec/ARB_compute_shader/compiler/work_group_size_too_large
* spec/ARB_compute_shader/compiler/gl_WorkGroupSize_matches_layout.comp
This was regressed in 738c9c3c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, we would parse MESA_EXTENSION_OVERRIDE each time a context
was created. Now we will save the results of that parsing and use it
during context initialization.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will allow us to utilize the early MESA_EXTENSION_OVERRIDE
parsing at the later extension string initialization step.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will allow us to utilize the early MESA_EXTENSION_OVERRIDE
parsing at the later extension string initialization step.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In 25268b93, we added a new environment variable
(INTEL_COMPUTE_SHADER) to allow some constant values to be upgraded
for the ARB_compute_shader extension.
Now, we can look to see if the extension was enabled via the
MESA_EXTENSION_OVERRIDE environment variable.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
During the early one_time_init phase of context creation, we
initialize two global gl_extensions structures.
We read the MESA_EXTENSION_OVERRIDE environment variable, and store
positive and negative overrides in two structures:
* struct gl_extensions _mesa_extension_override_enables
* struct gl_extensions _mesa_extension_override_disables
These are filled before the driver initializes extensions and
constants, therefore the driver can make adjustments based on the
desired overrides.
This can be useful during development of a new extension where the
extension is only partially ready. The driver can't actually advertise
support for the extension, but if it sees that the override is set for
the extension, then it can expose more supported parts of the
extension, such as upgrading context constants.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We will add new gl_extensions structures that capture the environment
variable extension overrides and are available early in context
creation.
This will allow a driver to take actions during its initialization
based on the extension overrides.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously setting:
MESA_EXTENSION_OVERRIDE=-GL_MESA_ham_sandwich
Would cause Mesa to advertise support for the GL_MESA_ham_sandwich
extension, even though the override specifically asked for it to be
disabled.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Patch adds a preprocessor define for the extension and stores explicit
location data for uniforms during AST->HIR conversion. It also sets
layout token to be available when having the extension in place.
v2: change parser check to require GLSL 330 or enabling
GL_ARB_explicit_attrib_location (Ian)
v3: fix the check and comment in AST->HIR (Petri)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Support inactive uniforms that have explicit location set in
glUniform* functions.
v2: remove unnecessary extension check, use new define (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch refactors the existing uniform processing so explicit locations
are taken in to account during variable processing. These locations
are temporarily stored in gl_uniform_storage before actual locations
are set.
UNMAPPED_UNIFORM_LOC marks unset location so that we can use 0 as a
valid explicit location.
When locations are set, UniformRemapTable is first populated with
uniforms that have explicit location set (inactive and active ones),
rest are put after explicit location slots.
v2: introduce define for locations that have not been set yet (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch initializes the UniformRemapTable for explicit locations. This
needs to happen before optimizations to make sure all inactive uniforms
get their explicit locations correctly.
v2: fix initialization bug, introduce define for inactive uniforms (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This function calculates the number of unique values from
glGetUniformLocation for the elements of the type.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch adds new implementation dependent value required by the
GL_ARB_explicit_uniform_location extension. Default value for user
assignable locations is calculated as sum of MaxUniformComponents
for each stage.
v2: fix descriptor in get_hash_params.py (Petri)
v3: simpler formula for calculating initial value (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We've used the LD sampler message for pull constant loads on earlier
hardware for some time, and also were already using it for the FS on
Broadwell. This patch makes us use it for Broadwell VS/GS as well.
I believe that when I wrote this code in 2012, we still used the data
port in some cases, and I somehow neglected to convert it while
rebasing.
Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821%
(n = 17). Many other applications should benefit similarly: this speeds
up uniform array access in the VS, which is commonly used for skinning
shaders, among other things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
When faced with code such as:
mov vgrf31.0:UD, 960D
mov vgrf31.1:UD, vgrf30.xxxx:UD
The dead code eliminator didn't consider reg_offsets, so it decided that
the second instruction was writing was writing to the same register as
the first one, and eliminated the first one. But they're actually
different registers.
This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above
code, vgrf31.0 represents the offset into the shader_time buffer where
the data should be written, and vgrf31.1 represents the actual time
data. With a completely undefined offset, results were...unexpected.
I think this is probably one of the few cases (maybe only case) where we
generate multiple MOVs to a large VGRF. Normally, we just use them as
texturing results; the other SEND-from-GRF uses a size 1 VGRF.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
"shader_time_add" is a lot more informative than "op152".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix clang mismatched-tags warnings introduced with commit
4f5445a45d.
./glsl_symbol_table.h:37:1: warning: class 'glsl_type' was previously declared as a struct [-Wmismatched-tags]
class glsl_type;
^
./glsl_types.h:86:8: note: previous use is here
struct glsl_type {
^
./glsl_symbol_table.h:37:1: note: did you mean struct here?
class glsl_type;
^~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch fixes several clang constant-logical-operand warnings such as
the following.
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: warning: use of logical '||' with constant operand [-Wconstant-logical-operand]
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^ ~~~~~~~~~~~
../../../../../src/mesa/tnl_dd/t_dd_tritmp.h:130:32: note: use '|' for a bitwise operation
if (DO_TWOSIDE || DO_OFFSET || DO_UNFILLED || DO_TWOSTENCIL)
^~
|
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some apps seem to give us a null sampler/view for texture slots which
come before the last used texture slot. In particular 0ad triggers
this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium (but not OpenGL) does allow nesting of queries, but there's no
limit specified (d3d10 has no limit neither). Nevertheless, for practical
purposes we need some limit in llvmpipe, otherwise we'd need more complex
handling of queries as we need to keep track of all binned queries (this
only affects queries which gather data past setup). A limit of 16 is too
small though, while 64 would suffice.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The comment for _mesa_is_type_integer is confusing because it says that it
returns whether the type is an “integer (non-normalized)” format. I don't
think it makes sense to say whether a type is normalized or not because it
depends on what format it is used with. For example, GL_RGBA+GL_UNSIGNED_BYTE
is normalized but GL_RGBA_INTEGER+GL_UNSIGNED_BYTE isn't. If the normalized
comment is just a mistake then it still doesn't make much sense because it is
missing the packed-pixel types such as GL_UNSIGNED_INT_5_6_5. If those were
added then it effectively just returns type != GL_FLOAT.
That function was only used in _mesa_is_enum_format_or_type_integer. This
function effectively checks whether the format is non-normalized or the type
is an integer. I can't think of any situation where that check would make
sense.
As far as I can tell neither of these functions have ever been used anywhere
so we should just remove them to avoid confusion.
These functions were added in 9ad8f431b2.
Reviewed-by: Brian Paul <brianp@vmware.com>
That commit made possible that the items could be one just
after the other when their size was a multiple of ITEM_ALIGNMENT.
But compute_memory_prealloc_chunk still looked to leave a gap
between items. Resulting in that we got an infinite loop when
trying to add an item which would left no space between itself and
the next item.
Fixes piglit test: cl-custom-r600-create-release-buffer-bug
And the test for alignment I have just sent:
http://lists.freedesktop.org/archives/piglit/2014-June/011135.html
Sorry about this.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The code that parses LIBGL_DRIVERS_PATH was printing an
error for every attempted dlopen. It's not an error to
have to check multiple items in the path, only an error if
no suitable library is found. Reduced the load error to
a warning to match behavior of dynamic linker.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use the has_streamout flag as we do elsewhere to check if we need
to call pipe->set_stream_output_targets(). The driver might implement
the set_stream_output_targets() function, but not for all hardware
configurations.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When a multisampled texture is used for sampling the fast clear color value
needs to be programmed into the surface state. This was being left as all
zeroes so if the surface was cleared to a value other than black then it
wouldn't work properly. This doesn't matter for single-sample textures because
in that case the MCS buffer is resolved before it is used as a texture source.
https://bugs.freedesktop.org/show_bug.cgi?id=79729
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
We only need to alter the default state if we're emitting MOVs for
header related fields. So, we can simply move the push/pop of state in
to the if (header_present) block, bypassing it in the common case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
In commit dc2d3a7f5c, Iago accidentally
moved fire_fb_write() above the brw_pop_insn_state(), which caused the
SEND to lose its predication and change from WE_normal to WE_all.
Haswell uses predicated SENDs for discards, so this broke Piglit's
tests for discards.
We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked,
but the actual FB write itself should respect those. So, pop state
first, and force it again around the single MOV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
Will simplify the automated conversion if we want to allow compiling the
driver for a single generation.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes sure to use a no-op swizzle while iteratively rendering each
level of a mipmap otherwise we may loose components and effectively
apply the swizzle twice by the time these levels are sampled.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we would emit the comparison, emit an AND to mask off extra
bits from the comparison result, then convert the result to float. Now,
do the comparison, then use a cleverly constructed SEL to pick either
0.0f or 1.0f.
No piglit regressions on Ivybridge.
total instructions in shared programs: 1642311 -> 1639449 (-0.17%)
instructions in affected programs: 136533 -> 133671 (-2.10%)
GAINED: 0
LOST: 0
Programs that are affected appear to save between 1 and 5 instuctions
(just by skimming the output from shader-db report.py.
v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove
extraneous fix_3src_operand (suggested by Matt). The latter change
required swapping the order of the operands and using predicate_inverse.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes the limits for GL 3.2, and subsequently fixes
some segfaults in some varying packing tests and max varying tests
after the limits bumped.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for GL 3.2 layered rendering to softpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the layer info to the tile cache.
This changes clear_flags to be dynamically allocated as
MAX_LAYERS seems like a too big step.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This passes the piglit depth clamp tests.
this is required for GL 3.2.
v2: move min/max up one level, could go further, thanks
to Roland for suggestion.
v1: Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This limits the number of emitted vertices to the shaders max output
vertices, and avoids us writing things into memory that isn't big
enough for it.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add Intel driver hook for glGetTexImage to accelerate the case of reading
texture image into a PBO. This case gets huge performance gains by using
GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed
by memcpy.
No regressions on Piglit tests with Intel driver.
Performance gain (1280 x 800 FBO, Ivybridge):
glGetTexImage + glMapBufferRange with patch 1.45 msec
glGetTexImage + glMapBufferRange without patch 4.68 msec
v3: (by Kenneth Graunke)
- Fix compile after Eric's change to drop the tiling argument
to intel_miptree_create_for_bo.
- Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit
regressions.
- Squash in several whitespace and coding style fixes.
When walking backwards, we want to stop at the head sentinel, which is
where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL.
Fixes random crashes, as well as valgrind errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Giving the meta clear program a meaningful name makes it easier to find
in output such as INTEL_DEBUG=fs or INTEL_DEBUG=shader_time. We already
did so for integer programs, but neglected to label the primary program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These used to call different math emitters (brw_math vs. brw_math2).
Now that they both call gen6_math, they're virtually identical.
When unrolling SIMD16 to multiple SIMD8 operations, we should take care
not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up
with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's
valid.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These functions are basically identical, so we should combine them.
However, they're so trivial, we may as well just fold them into their
only call sites.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
These are trivial to combine: we should just avoid checking the second
operand if it's brw_null_reg.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Usually, I try to use "brw" for functions that apply to all generations,
and "gen4" for dead end/legacy code that is only used on Gen4-5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Our existing functions, brw_math and brw_math2, had unclear roles:
Gen4-5 used brw_math for both unary and binary math functions; it never
used brw_math2. Since operands are already in message registers, this
is reasonable.
Gen6+ used brw_math for unary math functions, and brw_math2 for binary
math functions, duplicating a lot of code. The only real difference was
that brw_math used brw_null_reg() for src1.
This patch improves brw_math2's assertions to allow both unary and
binary operations, renames it to gen6_math(), and drops the Gen6+ code
out of brw_math().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Thread switching on control flow instructions is a documented workaround
for Gen4-5 errata. As far as I can tell, it hasn't been needed since
Sandybridge. Thread switching is not free, so in theory this may help
performance slightly.
Flow control instructions with the "switch" flag cannot be compacted, so
removing it will make these instructions compactable. (Of course, we
still have to implement compaction for flow control instructions...)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 2081469 -> 2081248 (-0.01%)
instructions in affected programs: 22606 -> 22385 (-0.98%)
No programs were hurt by this patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Only function-defs use glsl_type so forward declare instead.
Compile-tested on my Ivy-bridge system.
IWYU also suggests removing #include <new>, and this compiles fine.
I'm not familiar enough with memory management in C/C++ that I feel
comfortable removing this. Insights would be appreciated.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU. Comment says it's for struct gl_extensions.
Grepping for gl_extensions shows no uses.
Tested by compiling on my Ivy-bridge system.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU, compile-tested on my Ivy-bridge system.
This is not used in the header, and is included in the source.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Found with IWYU, confirmed with grepping for "hash" and "symbol".
No negative effects on compilation.
IWYU also reported core.h and linker.h could be removed,
but I'm unsure if those are false positives.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
This fixes an issue when running cl-program-bitcoin-phatk
piglit test where some of the inputs have negative values
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Now, items whose size is a multiple of 1024 dw won't leave
1024 dw between itself and the following item
The rest of the cases is left as it was
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Removed compute_memory_defrag declaration because it seems
to be unimplemented.
I think that this function would have been the one that solves
the problem with fragmentation that compute_memory_finalize_pending has.
Also removed comments that are already at compute_memory_pool.c
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Explanation of the changes, as requested by Tom Stellard:
Let's take need after is calculated as
item->size_in_dw+2048 - (pool->size_in_dw - allocated)
BEFORE:
If need is positive or 0:
we calculate need += 1024 - (need % 1024), which is like
cealing to the nearest multiple of 1024, for example
0 goes to 1024, 512 goes to 1024 as well, 1025 goes
to 2048 and so on. So now need is always possitive,
we do compute_memory_grow_pool, check its output
and continue.
If need is negative:
we calculate need += 1024 - (need % 1024), in this case
we will have negative numbers, and if need is
[-1024:-1] 0, so now we take the else, recalculate
need as need = pool->size_in_dw / 10 and
need += 1024 - (need % 1024), we do
compute_memory_grow_pool, check its output and continue.
AFTER:
If need is positive or 0:
we jump the if, calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
If need is negative:
we enter the if, and need is now pool->size_in_dw / 10.
Now we calculate need += 1024 - (need % 1024)
compute_memory_grow_pool, check its output and continue.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
In this case, NULL checks are added to compute_memory_grow_pool,
so it returns -1 when it fails. This makes necesary
to handle such cases in compute_memory_finalize_pending
when it is needed to grow the pool
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Always default to --enable-driglx-direct, now that will build driswrast, but
won't try to use dri[123] on platforms which don't have that.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some untangling to fix building in the dri_platform=none, --enable-driglx-direct
case, where only driswast can be used.
Turn the test for including the glXGetScreenDriver()/glXGetScreenDriver()
interface used by xdriinfo from !GLX_USE_APPLEGL into a positive form, as it is
only useful when dri_platform=drm
Add additional GLX_USE_DRM tests so DRI[123] renderers are only used when
dri_platform=drm
Note that swrast and indirect must still be disabled in the APPLEGL case at the
moment, which makes things more complex than they need to be. More untangling
is needed to allow that
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Originally all hardware drivers duplicate the driver_name string
from an external source, while for the software rasterizer we set
it to "swrast". Follow the example set by hw drivers this way
we can free the string at dri2_terminate().
v2: Use strdup over strndup. Suggested by Ilia Mirkin.
v3: Handle platform_drm in a similar manner. Cleanup swrast
driver_name in error path.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This fixes:
In file included from
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_exec_api.c:445:0:
/home/adrian/workspace/mesa/mesa-master.git/src/mesa/vbo/vbo_attrib_tmp.h:28:38:
fatal error: util/u_format_r11g11b10f.h: No such file or directory
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adrian Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Fixes linker error:
ld:
.../libmesa_dri_common_intermediates/libmesa_dri_common.a(dri_util.o):
in function globalDriverAPI:dri_util.c(.data.rel+0x0): error:
undefined reference to 'driDriverAPI'
As an example, you can see that mesa_dri_drivers
also uses common/libmegadriver_stub (src/mesa/drivers/dri/Makefile.am)
The _stub part might be confusing, but
it actually provides the dri-driver shared lib constructor,
megadriver_stub_init, which will later on load the real
platform dependent part and call
l __driDriverGetExtensions_<platform>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adrian Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
The negation source modifier on src registers has changed meaning in Broadwell when
used with logical operations. Don't copy propagate when negate src modifier is set
and when the destination instruction is a logical op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
The negation source modifier on src registers has changed meaning in Broadwell when
used with logical operations. Don't copy propagate when negate src modifier is set
and when the destination instruction is a logical op.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
We've been allowing `centroid` and `sample` in all kinds of weird places
where they're not valid.
Insist that `sample` is combined with `in` or `out`;
and that `centroid` is combined with `in`, `out`, or the deprecated
`varying`.
V2: Validate this in a more sensible place. This does require an extra
case for uniform blocks members and struct members, though, since they
don't go through the normal path.
V3: Improve error message wording; eliminate redundant error generation
for inputs in VS or outputs in FS.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Threads must terminate with a SEND message to a particular shared function,
such as a URB write or FB write, so the instruction stream really shouldn't
ever end in an IF/ELSE/ENDIF or similar block structure.
However, if the instruction stream (incorrectly) ends in a block structure
the last block's end pointer will not be set, leading to a crash later on in
fs_live_variables::setup_def_use(). It is better to detect this earlier, so
assert on that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In Gen < 6 the hardware generates a runtime bit that indicates whether AA data
has to be sent as part of the framebuffer write SEND message. This affects the
specific case where we have setup antialiased line rendering and we render
polygons which have one face setup in GL_LINE mode (line antialiasing
will be used) and the other one in GL_FILL mode (no line antialiasing needed).
Currently we are not doing this runtime test and instead we always send AA
data, which produces incorrect rendering of the GL_FILL face of the polygon in
in the aforementioned scenario (verified in ironlake and gm45).
In Gen4 this is, likely, a regression introduced with commit 098acf6c84. In
Gen5 this has never worked properly. Gen > 5 are not affected by this.
The patch fixes the problem by adding the appropriate runtime check and
adjusting the framebuffer write message accordingly in the conflictive
scenario.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This logic is reusable across CompressedTex*Image* and
GetCompressedTexImage; the strides calculated will also be needed
in the PBO validation functions to ensure that the referenced range of
bytes is valid.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Move _mesa_error call for INVALID_VALUE to one place.
Remove checks for previous value matching -- this was important when we
were flushing vertices before the update, but that hasn't happened for a
long time now.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A recent ApiTrace change, that tries to dump more buffer state
causes Mesa from my distro (10.1.4) to segfaults here.
I haven't actually confirm this fixes it (I can't repro on master),
but it seems a good idea to be defensive here anyway.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit f3cb2e6ed7.
brw_land_fwd_jump() is convenient wherever we produce JMPI instructions
and we will use JMPI to implement framebuffer writes that involve line
antialiasing in gen < 6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm making a lot of changes to this area, and I figured I may as well
not conflate these trivial changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With my earlier cleaning in place (see git log brw_eu_emit.c), nothing
relies on the instruction emitters for IF/WHILE/JMPI disabling
predication. Drop it in favor of making callers do the right thing
explicitly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There is a short-immediate version as well, but it should never end up
getting used since it would have gotten folded earlier.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Match the behavior of the SCons MinGW build.
This patch also fixes these build errors.
CC glapi_entrypoint.lo
glapi_entrypoint.c: In function 'init_glapi_relocs_once':
glapi_entrypoint.c:341:4: error: unknown type name 'pthread_once_t'
static pthread_once_t once_control = PTHREAD_ONCE_INIT;
^
glapi_entrypoint.c:341:41: error: 'PTHREAD_ONCE_INIT' undeclared (first use in this function)
static pthread_once_t once_control = PTHREAD_ONCE_INIT;
^
glapi_entrypoint.c:341:41: note: each undeclared identifier is reported only once for each function it appears in
glapi_entrypoint.c:342:4: error: implicit declaration of function 'pthread_once' [-Werror=implicit-function-declaration]
pthread_once( & once_control, init_glapi_relocs );
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code was originally introduced to fix https://bugs.freedesktop.org/show_bug.cgi?id=53617. The comment says you need to pass NULL in order to unref old views however cso_set_sampler_views() already takes care of old views with the second for loop. Also as of 2355a64414 cso_set_sampler_views() passes the max of the old and new views to the driver for all state trackers making this code obsolete.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
The nouveau fw currently prints a bunch of errors. No point in seeing
those all the time, esp since compute doesn't really work in the first
place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The new hardware actually supports this OpenGL 1.x feature natively,
so we can finally drop our shader workarounds.
Not many applications use GL_CLAMP, and most use it unintentionally, but
it's trivial to do right, so we should.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
When dereferencing an element of gl_SampleMaskIn[], the source register
here will be a HW_REG rather than a VGRF because the payload slot is
now exposed directly.
Fixes an assertion failure in the Piglit test:
tests/spec/arb_gpu_shader5/execution/samplemaskin-basic
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At least on MSVC we statically link against the CRT, so we must disable
the CRT message boxes if we want unattended testing.
The messages are convenient when running manually, so let them be if the
system error message boxes are not disabled.
The ARB_gpu_shader5 spec says:
"To determine whether the conversion for a single argument in one match is
better than that for another match, the following rules are applied, in
order:
1. An exact match is better than a match involving any implicit
conversion.
2. A match involving an implicit conversion from float to double is
better than a match involving any other implicit conversion.
3. A match involving an implicit conversion from either int or uint to
float is better than a match involving an implicit conversion from
either int or uint to double.
If none of the rules above apply to a particular pair of conversions,
neither conversion is considered better than the other."
V3: Add spec citation, including oddball difference between gs5 and GLSL
4.0; comment a bit better as per Jordan's suggestions.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will facilitate GLSL 4.0 / ARB_gpu_shader5's enhanced overload
resolution rules, and also possibly better error reporting for ambiguous
function calls.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
V2: Fix crashes during linking, where the parse state is NULL. In this
case, all required checks have already been done, so we assume the
extension is enabled.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The available implicit conversions depend on the GLSL version we're
compiling.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to add new implicit conversions, first for ARB_gpu_shader5,
and then later for ARB_gpu_shader_fp64. Pull out the opcode
determination into its own function, and get rid of the bool -> float
case that could never be hit anyway [since it fails the is_numeric()
check].
V2: Retain the vector width mangling. It turns out this is necessary for
the conversions done (and then thrown away) when determining the return
type of arithmetic operators.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This works like glsl-1.20+'s invariant redeclarations, but with fewer
restrictions, since `precise` is allowed on pretty much anything.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are several common ways to check whether an object is a particular
subclass: dynamic_cast<>, the as_subclass() pattern, or explicit enum
tags. We originally used the virtual as_subclass methods, but later
added enum tags as they are much nicer for debugging.
Since we have the enum tags, we don't necessarily need to use virtual
functions to implement the as_subclass() methods. We can just check the
tag and return the pointer or NULL.
This saves 18 entries in the vtable, and instead of two pointer
dereferences per as_subclass() call most are only three inline
instructions.
Compile time of sam3/112.frag (the longest compile in a recent shader-db
run) is reduced by 5% from 348 to 329 ms (n=500).
perf stat of this workload shows:
24.14% reduction in iTLB-loads: 285,543 -> 216,606
42.55% reduction in iTLB-load-misses: 18,785 -> 10,792
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Now that the constructors set a type, ir_type_unset is not very useful.
Move it to the end of the enum (specifically out of position 0) so that
enums checks for dereferences and rvalues can save an instruction.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Makes checking whether an object is an ir_dereference, an ir_rvalue, or
an ir_jump simpler. Since ir_dereference is a subclass or ir_rvalue,
list its subtypes first so that they can both generate nice code.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This has the added perk that if you forget to set ir_type in the
constructor of a new subclass (or a new constructor of an existing
subclass) the compiler will tell you... instead of relying on
ir_validate or similar run-time detection.
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We have customers using NULL as a way to test the robustness of the API.
Without this check, EGL will segfault trying to dereference
dri2_surf->wl_win->private because wl_win is NULL.
This fix adds a check and sets EGL_BAD_NATIVE_WINDOW
v2: Incorporated feedback from idr - moved the check to a higher level
function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
For some reason, CP DMA doesn't follow the predicate bit if I enable it,
so this is the only option.
This fixes piglit: spec/NV_conditional_render/clear
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Same as b026b6bbfe, but
COLOR_ARRAY_SIZE/SECONDARY_COLOR_ARRAY_SIZE.
Ideally we wouldn't munge the incoming state, so that we wouldn't need
to unmunge it back on glGet*. But the array size state is copied and
referred in many places, many of which couldn't take an GLenum like
GL_BGRA instead of a plain integer. So just hack around on glGet*,
to ensure there is no risk of introducing regressions elsewhere.
This bug causes problems to Apitrace, resulting in wrong traces. See
https://github.com/apitrace/apitrace/issues/261 for details.
Tested with piglit arb_vertex_array_bgra-get, which was created for this
purpose.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
These were missed in commit e374809819.
Fixes 'make check'.
CC test_eu_compact.o
test_eu_compact.c: In function ‘gen_f0_0_MOV_GRF_GRF’:
test_eu_compact.c:222:4: error: implicit declaration of function ‘brw_set_predicate_control’ [-Werror=implicit-function-declaration]
brw_set_predicate_control(p, true);
^
test_eu_compact.c: In function ‘run_tests’:
test_eu_compact.c:270:6: error: implicit declaration of function ‘brw_set_access_mode’ [-Werror=implicit-function-declaration]
brw_set_access_mode(p, BRW_ALIGN_16);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Eventually we're going to use functions to set bits on an instruction.
Putting 'default' in the name of functions that alter default state will
help distinguins them.
This patch was generated entirely mechanically, by the following:
for file in brw*.{cpp,c,h}; do
sed -i \
-e 's/brw_set_mask_control/brw_set_default_mask_control/g' \
-e 's/brw_set_saturate/brw_set_default_saturate/g' \
-e 's/brw_set_access_mode/brw_set_default_access_mode/g' \
-e 's/brw_set_compression_control/brw_set_default_compression_control/g' \
-e 's/brw_set_predicate_control/brw_set_default_predicate_control/g' \
-e 's/brw_set_predicate_inverse/brw_set_default_predicate_inverse/g' \
-e 's/brw_set_flag_reg/brw_set_default_flag_reg/g' \
-e 's/brw_set_acc_write_control/brw_set_default_acc_write_control/g' \
$file;
done
No manual changes were done after running that command.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This removes the ability to set the default conditional modifier on all
future instructions. Nothing uses it, and it's not really a sensible
thing to do anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
With the predication changes eliminated, all this does is set the
conditional modifier on a single instruction. Doing that directly is
easy, and avoids mucking about with default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_set_conditionalmod and brw_next_insn work together to set the
conditional modifier for the next instruction, then turn it off.
The Gen8+ generators don't implement this: we just set it for all future
instructions, and whack it for each fs_inst/vec4_instruction.
Both approaches work out because we only set conditional_mod on
IR instructions like CMP, AND, and so on, which correspond to exactly
one assembly instruction. The Gen8 generators would break if we had
an IR instruction that generated multiple instructions, and the Gen4-7
EU emit layer would do...something.
To safeguard against this, assert that we only generated one instruction
if conditional_mod is set, and just set the flag directly on that
instruction rather than altering default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_set_conditionalmod has traditionally been complex: it causes
conditionalmod to be set for the next instruction, and then predication
to be set on all future instructions after that.
We may want to generate a flag condition and not use it immediately,
due to instruction scheduling or the like. Even if not, it's easy
to set things explicitly, and that's clearer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_CMP already takes a conditional modifier as a parameter, and sets it
accordingly. brw_set_conditionalmod() also makes everything after the
next instruction predicated, but we don't need that: we always emit an
IF instruction after load_clip_distance(), and that's already
predicated.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It wasn't too bad before, but the macro is going to be nicer once I
start modifying a lot more instructions in this pattern.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Often times, we want to emit an instruction, then set one field on it,
such as predication or a conditional modifier. Normally, we'd have to
declare "struct brw_instruction *inst;" and then use "inst =
brw_FOO(...)" to emit the instruction, which can hurt readability.
The new "brw_last_inst" macro refers to the most recently emitted
instruction, so you can just do:
brw_ADD(...)
brw_last_inst->header.predicate_control = BRW_PREDICATE_NORMAL;
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We use both predicated and unconditional JMPI instructions. But in each
case, it's clear which we want. It's simpler to just specify it as a
parameter, rather than relying on default state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In all cases, we set both dst and src0 to brw_ip_reg(). This is no
accident: according to the ISA reference, both are required to be the IP
register. So, we may as well drop the parameters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
EGL 1.4 Specification says that
eglMakeCurrent(display, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT)
can be used to release the current thread's ownership on the surfaces
and context.
MESA's egl implementation was only accepting the parameters when the
KHR_surfaceless_context extension is supported.
[chadv] Add quote from the EGL 1.4 spec.
Cc: "10,1, 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2 get rid of magic value, use DEFINES
v3 update clip_disable together with vs_position_window_space
Big thanks to Marek Olšák!
Signed-off-by: David Heidelberger <david.heidelberger@ixit.cz>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The fs_reg src array is going to turn into a pointer and we'd rather not
consider the implications of shallow copying fs_insts.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Will get more complicated when fs_reg src becomes a pointer.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Running shader-db with INTEL_DEBUG=noann reduces the runtime
from ~90 to ~80 seconds on my machine. It also reduces the disk space
consumed by the .out files from 660 MB (676 on disk) to 343 MB (358 on
disk).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With INTEL_DEBUG=optimizer, write the output of dump_instructions() to a
file each time an optimization pass makes progress. This lets you easily
diff successive files to see what an optimization pass did.
Example filenames written when running glxgears:
fs8-0000-00-start
fs8-0000-01-04-opt_copy_propagate
fs8-0000-01-06-dead_code_eliminate
fs8-0000-01-12-compute_to_mrf
fs8-0000-02-06-dead_code_eliminate
| | | |
| | | `-- optimization pass name
| | |
| | `-- optimization pass number in the loop
| |
| `-- optimization loop interation
|
`-- shader program number
Note that with INTEL_DEBUG=optimizer, we disable compact_virtual_grfs,
so that we can diff instruction lists across loop interations without
the register numbers being changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow debugging code to dump the IR after an optimization pass
makes progress (the next patch). Only let it open and write to a file if
the effective user isn't root.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use function overloading rather than default arguments, since gdb
doesn't know about default arguments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This made sense when swizzled storage layout was used for rendering to tiles.
But nowadays the name just adds confusion (and makes for long lines).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Framebuffers can have NULL attachments since a while. llvmpipe handled
that properly for lp_rast_shade_quads_mask but it seems the change didn't
make it to lp_rast_shade_tile.
This fixes piglit fbo-drawbuffers-none test (though I need to increase
the FB_SIZE from 32 to 256 so the tris cover some tiles fully).
https://bugs.freedesktop.org/show_bug.cgi?id=79421
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch fixes this build error with icc 14.0.2.
In file included from state_tracker/st_glsl_to_tgsi.cpp(63):
../../src/gallium/auxiliary/util/u_math.h(583): error: identifier "__builtin_clrsb" is undefined
return 31 - __builtin_clrsb(i);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
mesaVisual can be NULL with configless context since this commit:
commit 551d459af4
Author: Neil Roberts <neil@linux.intel.com>
Date: Fri Mar 7 18:05:47 2014 +0000
Add the EGL_MESA_configless_context extension
...
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig.
We attempt to dereference the visual in i915 and now we don't create a
zeroed-out one one it crashes, breaking at least weston in an i915. There's
no point in doing so as it would be zero anyway.
v2: Fixed a typo in commit message. Added some tags.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1100967
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These prototypes are necessary because GLES1 library builds will create
dispatch functions for them. We can't directly include GLES/gl.h
because it would conflict the previously-included GL/gl.h. Since GLES1
ABI is not expected to every add more functions, the path of least
resistance is to just duplicate the prototypes for the functions that
aren't already in desktop OpenGL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79294
Acked-by: Matt Turner <mattst88@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The math instruction was Align1-only on Gen6 and we never updated this
to let it use Align16 features like writemasking on newer platforms.
total instructions in shared programs: 1686120 -> 1685507 (-0.04%)
instructions in affected programs: 48593 -> 47980 (-1.26%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should print output both for debug and release builds.
Suggested by Jose.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
wglCreateContextAttribsARB() didn't work previously since it returned
a context ID that wasn't allocated by OPENGL32.DLL. So if that context
ID was later passed to wglMakeCurrent(), etc. it was rejected.
Now when wglCreateContextAttribsARB() is called we actually call
wglCreateContext() in order to get a valid context ID. Then we
replace the context data which was created with new context data
which reflects the arguments passed to wglCreateContextAttribsARB().
If there were a DrvCreateContextAttribs() function in the ICD this
work-around wouldn't be necessary.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Conflicts:
src/gallium/state_trackers/wgl/stw_ext_extensionsstring.c
src/gallium/state_trackers/wgl/stw_getprocaddress.c
If the assertion fails, it means something is really broken. Before,
if this happened we reverted to the GDI renderer without any warning.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
To reflect our actual SwapBuffers implementation. See
stw_st_swap_framebuffer_locked(). This fixes various rendering issues
with SolidEdge.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just happened to stumble across this registry key while debugging
something else.
This technique is much neater than trying to override opengl32.dll.
Also a few minors cleanups.
If texObj == NULL here it mean there is already GL_INVALID_VALUE
or GL_OUT_OF_MEMORY error set to context.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Introduce a simple PCI identification method of looking up the answer
the /sys filesystem (available on Linux). Attempted after libudev, but
before DRM.
Disabled by default (available only when the --enable-sysfs configure
option is specified).
Signed-off-by: Gary Wong <gtw@gnu.org>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
loader_get_pci_id_for_fd() and loader_get_device_name_for_fd() now attempt
all available strategies to identify the hardware, instead of conditionally
compiling in a single test. The existing libudev and DRM approaches have
been retained, attempting first libudev (if available) and then DRM (if
necessary).
Signed-off-by: Gary Wong <gtw@gnu.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The build fails with implicit delaration of drmGetCap (xf86drm.h)
Were we're including the header only when building the DRM_PLATFORM.
Wayland backend can operate without DRM_PLATFORM so replace the
guard, and fold in drmGetCap() usage to silence compiler warnings.
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
There is no reason anymore to load with RTLD_GLOBAL and for some driver
this even result in dlclose failing to unload leading to catastrophic
failure with swrast fallback.
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
This way, when someone modifies create_test_cases.py and forgets to
commit their changes again, people will notice.
v2: make sure we parse the right directories and check for existance the
right way.
v3 (Ken): Use $PYTHON2 instead of calling python directly.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In 088494aa (as well as other commits in the series) Paul Berry modified
the tests for lower_jumps to account for the fact that the s-expression
for the loop IR instruction changed from
(loop () () () () (statements...)) to (loop (statements...)), but he
forgot to update create_test_cases.py which he used to create the tests.
Fix that, so that now create_test_cases.py is synced with the generated
tests.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Make sure that we print the same number of digits when printing 0.0 as
any other floating-point number. This will make generating expected
output files for tests easier. To avoid breaking "make check," update
the generated tests for lower_jumps before the next commit which will
bring create_test_cases.py in line with them.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The call to get_variable_being_redeclared() may delete 'var' so we
can't reference var->name afterward. We fix that by examining the
var's name before making that call.
Fixes valgrind warnings and possible crash when running the piglit
tests/spec/glsl-1.30/execution/clipping/vs-clip-distance-in-param.shader_test
test (and probably others).
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we set up new entries in the params[] array on every access
of a rectangle texture. Unfortunately, we only reserve space for
(2 * MaxTextureImageUnits) extra entries, so programs which accessed
rectangle textures more times than that would write off the end of the
array and likely crash.
We don't really have a decent mapping between the index returned by
_mesa_add_state_reference and our index into the params array, so we
have to manually search for it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78691
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: mesa-stable@lists.freedesktop.org
Fixes framebuffer_blit_functionality_multisampled_to_singlesampled_blit
es3 cts test on bdw. Also fixes this on ivb when ivb is forced to use
the meta path.
No piglit regressions on IVB.
Further input from Ken:
"Unfortunately, this doesn't fix MRT for integer data.
In the single-sampled case, since we're directly copying data, we were
read/copy/write data as "float" values, which actually contained the
integer bits. Here, we can't do that since we need to process the
actual integer data.
I do wonder if we could use intBitsToFloat/uintBitsToFloat to stuff the
integer bits in the float gl_FragColor output. Just a crazy idea.
In the long term (post 10.2), I think we should draft an extension that
allows you to do "layout(location = all)" on user-defined fragment
shader outputs. (Or some similar syntax.)"
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
GK20A is mostly compatible with GK104, but uses the SM35 ISA. Use
the GK110 path when this chip is detected.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
GK20A is mostly compatible with GK104, but features a new 3D
class. Add it to the relevant header and use it when GK20A is
detected.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each of the subroutine emitters alter the predication state, but
otherwise don't change anything (or put it back when they do).
Resetting predication at the end makes these functions idempotent with
regard to the default instruction state - which is a nice property.
With that in place, push/pop is no longer necessary.
v2: Improve whitespace (requested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_MOV doesn't alter the default instruction state, so this does
nothing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_JMPI sets predicate_control to BRW_PREDICATE_NONE, but that's
already the value coming in. Otherwise, nothing changes state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This field is only used to track the current value of the flag register
during the SF compile. It has no place in the common compiler code.
While we're changing every call, drop the 'brw' prefix from the function
since it's static.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only the Gen4-5 SF program compiler actually uses this function; move
it there. Soon the fields will be moved out of brw_compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no point in pushing and popping the default state; the code
between the two stack operations doesn't alter anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
None of the assembly emitters called between push and pop actually
change the state. So, we can drop these.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, brw_CMP with a null destination implicitly set the default
state to make future instructions predicated. This is messy and
confusing - emitting a CMP that populates the flag register and later
using it to predicate instructions are logically separate. With the
main compiler, we may even schedule instructions between the CMP and the
user of the flag value.
This patch simplifies brw_CMP to just emit a CMP instruction, and not
mess with predication. It also updates all necessary callers. These
mostly fell into two patterns:
1. brw_CMP followed by brw_IF.
We don't need to do anything special here; brw_IF already sets up
predication appropriately.
2. brw_CMP followed by a single predicated instruction.
The old model was to call brw_CMP, emit the next (predicated)
instruction, then disable predication for any instructions beyond
that. Instead, just explicitly set predicate_control on the single
instruction we want to predicate. It's no more code, and requires
less cross-module knowledge.
This drops setting flag_value to 0xff as well, which is a field only
used by the SF compile. There is only one brw_CMP call in the SF code,
which is in do_twoside_caller, and called at the start of
brw_emit_tri_setup, where flag_value is already 0xff.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Presumably, this was to reset the default state of predication_control
from brw_CMP. But brw_CMP only sets that if dst is ARF null, which it
isn't here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When compiling any of the SF program variants, flag_value starts off as
0xff and will be modified when generating code.
brw_emit_anyprim_setup emits several subroutines, saving and restoring
flag_value across each of them. Since it starts out as 0xff, this is
equivalent to simply setting it to 0xff at the start of each subroutine.
Resetting the value makes more logical sense; each subroutine doesn't
know whether one of the others even executed, much less what it did
to the flag register.
This also lets us to drop the brw_set_predicate_control_flag_value call
from brw_init_compile: predicate is already initialized to
BRW_PREDICATE_NONE by the memset, and the value of flag_value is
irrelevant (as it's only used by the SF compiler).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit af38ef907, I added a "fix" to color outputs not being assigned
correctly when sample mask was being output. This was totally wrong --
the color indices (i.e. "si" values) were the ones that were wrong. Undo
that hunk.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit c5d822dad9 added support for sample mask incorrectly. It became
treated as a color output, and messed up the color output indices.
Revert the hunk that did that, and add explicit support just like for
depth/stencil writes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
We had a handful of cases where we'd used brw_imm_*() to generate an
immediate, rather than fs_reg(). We shouldn't do that but we shouldn't
limit scheduling flexibility on account of immediate arguments either.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using brw_imm_* creates a source with file=HW_REG, and the scheduler
inserts barrier dependencies when it sees HW_REG. None of these are
hardware-registers in the sense that they're special and scheduling
shouldn't touch them. A few of the modified cases already have HW_REGs
for other sources, so it won't allow extra flexibility in some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Leave only the gl/glx and mangled gl symbols.
XMesa* was never an official interface and the only
user of it was mesa-demos, while they were still in
the same repo as mesa.
v2: Conditionally use the version-script.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In the presence of LLVM the final library exports every symbol from
the llvm namespace. Resolve this by using a version script (w/o the
version/name tag).
Considering that there are only ~35 symbols, explicitly list them
to minimize the chances of rogue symbols sneaking in.
v2: Conditionally include the version-script.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The symbol was added with commit 45e2b51c853(DRI2/GLX: check for
vblank_mode in DRI2 GLX code) but was never used as such according
to git log.
Possibly it was marked as public due to confusion with
__driConfigOptions which was used for dri1 drivers.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Without this, I get linking failures (static linking).
The static linking is sort of required for me, because otherwise Steam and
applications using the Steam runtime regularily fail because my LLVM was
compiled and linked against a newer libgcc_s, libstdc++, etc. and uses
features from those newer versions. And instead of Steam just not
starting, my X starts crashing, whenever libGL fails to load a (32 bit)
driver.
Since I hate crashes of X and I don't think Valve/Steam will behave like
a proper distribution soon (rebuilds versus current Debian Testing, since
they base their Steam OS off that), I need a radeonsi which carries its
own LLVM within and doesn't care about what the runtime sets. This means
linking Mesa statically.
v1 → v2: Move logic to configure.ac
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
The profiles are present depending on the defines at build time.
Drop the extra functions and feed the defines directly into the
state-tracker at build time.
v2: Drop unused variable i.
Acked-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
... rather than the one defined in our internal interface (dri_interface.h)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If we make ann_count non-zero, annotation_finalize() won't bail.
Not modifying it seems to make the code more clear than would modifying
annotation_finalize().
This reverts commit a6860100b8.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77707
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit 2dfbbeca50 with the
comment about MAC and implicit accumulator removed.
Why this code didn't work in all circumstances is unknown and without a
working Ironlake simulator (which uses a different AUB format) we'll
probably never know, short of a lot of experimentation, and spending a
bunch of time to try to optimize a few instructions on Ironlake is not
time well spent.
Moreover, for mix(vec4, vec4, vec4) using the accumulator introduces a
dependence between the otherwise independent per-component calculations.
Not using the accumulator, even if it means an extra instruction per
component might be preferable. We don't know, we don't have data, and
we don't have the necessary register on Ironlake for shader_time to tell
us.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77703
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Note the weirdness with src1 subregs. The compacted immediate fields are
uncompacted to bits [127:96] and the high five bits of the subreg
mapping maps to bits [100:96].
Number of compacted instructions: 790085 -> 817752 (3.50%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Also perform arithmetic on char* rather than void* since the latter is a
GNU C extension not available in C++.
Reviewed-by: Eric Anholt <eric@anholt.net>
That we were comparing its return value with offsets should have been a
clue. :)
Make it take a void *store in preparation for making the function useful
elsewhere.
Reviewed-by: Eric Anholt <eric@anholt.net>
... to tell us whether it emitted any code. Will be used to determine
whether we need to skip an annotation for it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Will be used to print disassembly after jump targets are set and
instructions are compacted, while still retaining higher-level IR
annotations and basic block information.
An array of 'struct annotation' will live along side the generated
assembly. The generators will populate the array with their IR
annotations, and basic block pointers if the instructions began or ended
a basic block pointer.
We'll then update the instruction offset when we compact instructions
and then using the annotations print the disassembly.
Reviewed-by: Eric Anholt <eric@anholt.net>
The DO instruction doesn't exist on Gen6+. Since before this commit, DO
always ended a basic block, if it also happened to start one (e.g., a
while loop inside an if statement) the block containing only the DO
would actually contain no hardware instructions.
Pre-Gen6's WHILE instructions jumps to the instruction following the DO,
so strictly speaking we won't be modeling that properly, but I claim
there is actually no functional difference.
This will simplify an upcoming change where we want to mark the first
hardware instruction in the loop as beginning a block, and the last
instruction before the loop as ending one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Note that predicated instructions with defs are still not supported
because transformation to SSA doesn't handle them yet.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The glGetGraphicsResetStatusARB from ARB_robustness extension always
returns GUILTY_CONTEXT_RESET_ARB and never returns NO_ERROR for guilty
context with LOSE_CONTEXT_ON_RESET_ARB strategy. This is because Mesa
returns GUILTY_CONTEXT_RESET_ARB if batch_active !=0 whereas kernel
driver never reset batch_active and this variable always > 0 for guilty
context. The same behaviour also can be observed for batch_pending and
INNOCENT_CONTEXT_RESET_ARB.
But ARB_robustness spec says:
If a reset status other than NO_ERROR is returned and subsequent calls
return NO_ERROR, the context reset was encountered and completed. If a
reset status is repeatedly returned, the context may be in the process
of resetting.
8. How should the application react to a reset context event?
RESOLVED: For this extension, the application is expected to query the
reset status until NO_ERROR is returned. If a reset is encountered, at
least one *RESET* status will be returned. Once NO_ERROR is
encountered, the application can safely destroy the old context and
create a new one.
The main problem is the context may be in the process of resetting and
in this case a reset status should be repeatedly returned. But looks
like the kernel driver returns nonzero active/pending only if the
context reset has already been encountered and completed. For this
reason the *RESET* status cannot be repeatedly returned and should be
returned only once.
The reset_count and brw->reset_count variables can be used to control
that glGetGraphicsResetStatusARB returns *RESET* status only once for
each context. Note the i915 triggers reset_count twice which allows to
return correct reset count immediately after active/pending have been
incremented.
v2 (idr): Trivial reformatting of comments.
Signed-off-by: Pavel Popov <pavel.e.popov@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Define GLX_USE_APPLEGL, as config/darwin used to, to turn on specific code to
use the applegl direct renderer
Convert src/glx/apple/Makefile to automake
Since the applegl libGL is now built by linking libappleglx into libGL, rather
than by linking selected files into a special libGL:
- Remove duplicate code in apple/glxreply.c and apple/apple_glx.c. This makes
apple/glxreply.c empty, so remove it
- Some indirect rendering code is already guarded by !GLX_USE_APPLEGL, but we
need to add those guards to indirect_glx.c, indirect_init.c (via it's
generator), render2.c and vertarr.c so they don't generate anything
Fix and update various includes
glapi_gentable.c (which is only used on darwin), should be included in shared
glapi as well, to provide _glapi_create_table_from_handle()
Note that neither swrast nor indirect is supported in the APPLEGL path at the
moment, which makes things more complex than they need to be. More untangling
is needed to allow that
v2: Correct apple/Makefile.am for srcdir != builddir
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
- Don't require xcb-dri[23] etc. if we aren't building for a target with DRM, as
we won't be using dri[23]
- Enable a more fine-grained control of what DRI code is built, so that a libGL
using direct swrast can be built on targets which don't have DRM.
The HAVE_DRI automake conditional is retired in favour of a number of other
conditionals:
HAVE_DRI2 enables building of code using the DRI2 interface (and possibly DRI3
with HAVE_DRI3)
HAVE_DRISW enables building of DRI swrast
HAVE_DRICOMMON enables building of target-independent DRI code, and also enables
some makefile cases where a more detailled decision is made at a lower level.
HAVE_APPLEDRI enables building of an Apple-specific direct rendering interface,
still which requires additional fixing up to build properly.
v2:
Place xfont.c and drisw_glx.c into correct categories.
Update 'make check' as well
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Fix build for darwin, when ./configured --disable-driglx-direct
- darwin ld doesn't support -Bsymbolic or --version-script, so check if ld
supports those options before using them
- define GLX_ALIAS_UNSUPPORTED as config/darwin used to, as aliasing of non-weak
symbols isn't supported
- default to -with-dri-drivers=swrast
v2:
Use -Wl,-Bsymbolic, as before, not -Bsymbolic
Test that ld --version-script works, rather than just looking for it in ld --help
Don't use -Wl,--no-undefined on darwin, either
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It works fine, though it requires using ELF objects.
With this change there is nothing preventing us to switch exclusively
to MCJIT, everywhere. It's still off though.
If the source renderbuffer has a depth > 0, then send a Z texcoord
which is set to the source attachment Z offset.
This fixes piglit's gl-3.2-layered-rendering-gl-layer-render with the
GL_TEXTURE_2D_MULTISAMPLE_ARRAY case test on i965/gen8.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
brw_color_buffer_write_enabled depends on brw->fragment_program, which
means we have to listen to BRW_NEW_FRAGMENT_PROGRAM.
On most generations, this was only called from a function that already
subscribed. However, on Broadwell, we failed to listen to the necessary
event in the atom that emits 3DSTATE_PS_BLEND.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
I forgot to disable writemasking on the OR and MOV which set the render
target index and "source 0 alpha present to render target" bit.
Using get_element_ud is equivalent and avoids a line-wrap.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Commit a2fb71e23 introduced 32-bit code for SSE4.1. Fix compilation, and
make sure to check ecx for the SSE4.1 bit.
[imirkin: switch sse4.1 to look at ecx]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes MinGW x64 builds. We don't use assembly on any of the
Windows builds, to avoid divergence between MSVC and MinGW when testing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch fixes this build error on Mac OS X.
CCLD libglapi.la
clang: warning: argument unused during compilation: '-pthread'
clang: warning: argument unused during compilation: '-pthread'
ld: unknown option: --no-undefined
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
-fstack-protector-strong is not supported by clang.
This patch fixes this build error on Fedora 20 with clang.
CXX gallivm/lp_bld_debug.lo
clang: error: unknown argument: '-fstack-protector-strong'
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75010
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Seems the opcodes are slightly different from a2xx. Resync headers and
move blend_func() helper into hw generation specific code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For most drivers this if statement is always going to fail so check the constant value first.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
We already multiply by bytes per pixel for this, so f3ba7611 broke
mem2gmem for depth/stencil. Drop the now-redundant mutiply by cpp.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Note that this covers the Begin/End rendering path, but not user vertex
arrays (so we can't drop copy_array_to_vbo_array() code). Improves
performance of isosurf GLVERTEX|TRIANGLES by 16.7506% +/- 4.98934%
(n=20). No difference on openarena (n=10), which was why this was reverted
back in cbde276580.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where there was no color buf bound, there were inconsistancies
in register settings related to position of depth/stencil inside GMEM.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These aren't buffers we ever read back from CPU, so using incorrect
reloc fxn wasn't really harming anything. But might as well be correct.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
_mesa_meta_setup_blit_shader() currently generates a fragment shader
which, irrespective of the number of draw buffers, writes the color
to only one 'out' variable. Current shader rely on an undefined
behavior and possibly works by chance.
From OpenGL 4.0 spec, page 256:
"If a fragment shader writes to gl_FragColor, DrawBuffers specifies a
set of draw buffers into which the single fragment color defined by
gl_FragColor is written. If a fragment shader writes to gl_FragData,
or a user-defined varying out variable, DrawBuffers specifies a set
of draw buffers into which each of the multiple output colors defined
by these variables are separately written. If a fragment shader writes
to none of gl_FragColor, gl_FragData, nor any user defined varying out
variables, the values of the fragment colors following shader execution
are undefined, and may differ for each fragment color."
OpenGL 4.4 spec, page 463, added an additional line in this section:
"If some, but not all user-defined output variables are written, the
values of fragment colors corresponding to unwritten variables are
similarly undefined."
V2: Write color output to gl_FragColor instead of writing to multiple
'out' variables. This'll avoid recompiling the shader every time
draw buffers count is updated.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.
Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Modern applications frequencly use both UNORM buffers and FLOAT buffers
with color clamping disabled. (FLOAT with clamping explicitly enabled
and SNORM buffers appear to be less common.) We don't need to emit
saturates in the fragment shader in either of the common cases.
Mesa sets ctx->Color._ClampFragmentColor to false if all the color
buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in
legacy OpenGL), it will be false if any FLOAT buffers are bound.
Since the common case is false, that should be our default.
Thanks to Roland Scheidegger for pointing out some faulty logic
in v1 of this patch (unnecessary code and incorrect explanations).
v2: Drop superfluous code and reword commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.
For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort. Splitting it makes that possible.
This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
SCons is required for Windows. Add links to flex/bison for Windows.
Reorder items and improve formatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes build error introduced with commit
4b04152db0.
CC test_eu_compact.o
test_eu_compact.c: In function ‘test_compact_instruction’:
test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]
brw_disasm(stderr, &src, brw->gen, false);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78888
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
We already have a perfectly good copy of the program key, and nobody is
going to modify it. The only reason we copied it was because the
brw_wm_compile structure embedded the key rather than pointing to it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Instead, just pass the key and prog_data as separate parameters.
This moves it up a level - one step further toward getting rid of it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away, but we still need a memory context that lives
for the duration of the compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the memory context situation was a bit of a mess:
fs_visitor allocated its own memory context, and freed it in the
destructor. However, some data produced by fs_visitor (such as the list
of instructions) needs to live beyond when fs_visitor is "done", so the
caller can pass it to fs_generator.
Everything worked out because brw_wm_fs_emit's fs_visitor variables
happen to not go out of scope until the end of the function. But that
meant that moving the declaration of, say, the SIMD16 fs_visitor
instance, could cause everything to explode.
Using a memory context that exists for the duration of the compile is
clearer, and should be equivalent.
Ultimately, we don't want to use 'c', but this matches the behavior of
fs_generator and gen8_fs_generator, so it'll be simple to change later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well. The caller has no
access to it anyway, so it's purely leaked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also a bit shorter.
Marking the key pointer as const will also deter people from changing
it in these classes, as that's absolutely not OK.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also shorter.
Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense. I decided it would be
reasonable to group these all together in a struct, since they're
highly related.
v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly. It's already in the right format.
Of course, there are zero Piglit tests for this. We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask. Renaming should help avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Nothing outside of fs_visitor uses it, so we may as well keep it
internal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.
We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.
In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
"Disassemble" is an accurate description of what this function does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're going to use "disassemble" for the function that disassembles
the whole program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
dump_prog_cache has interpreted compacted instructions as full size
instructions, decoding garbage and complaining about invalid values.
We can just use brw_dump_compile to handle this correctly in less code.
The output format changes slightly, but it's still perfectly acceptable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looping over the instructions and calling brw_disasm doesn't handle
compacted instructions. In most cases, this hasn't been a problem since
we don't compact prior to Sandybridge.
However, Sandybridge's transform feedback GS program should already be
compacted, and so this ought to fix decoding of that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
That was easy. Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).
While some unexpectedly long shader compile times for some shaders were fixed
with 8a9f5ecdb1 this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec
v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We allocate dispatch tables for BeginEnd and OutsideBeginEnd. But
when we destroy the context we were freeing the BeginEnd and Exec
tables. If Exec==BeginEnd we did a double-free. This would happen
if the context was destroyed while inside a glBegin/End pair. Now
free the BeginEnd and OutsideBeginEnd pointers.
Cc: "10.1", "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the valgrind report below and random crashes with piglit on radeonsi.
==30005== Conditional jump or move depends on uninitialised value(s)
==30005== at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100)
==30005== by 0xB14698B: st_translate_fragment_program (st_program.c:747)
==30005== by 0xB14777D: st_get_fp_variant (st_program.c:824)
==30005== by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042)
==30005== by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154)
==30005== by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162)
==30005== by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640)
==30005== by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574)
==30005== by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733)
==30005== by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854)
==30005== by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117)
==30005== by 0x4EA7168: run_test (piglit_fbo_framework.c:52)
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (e277d5c1f6). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gen8_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Has been misaligned since we added instruction offset prefixes.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
brw_disasm doesn't disassemble compacted instructions, so we uncompact
before disassembling them which would unset the compaction control bit.
Instead pass it as a separate argument.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Create the intel renderbuffer with level hardcoded to zero instead
of overriding it in the surface state configuration. Also moved the
dimension adjustments for tiling, mip level, msaa into the render
buffer creation. Finally prepares for another blit path needed for
miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
v2: Configure stencil directly for final dimensions instead of
adjusting bit by bit for tiling, mip level and msaa.
v3 (Ken): Used non-static constant for horizontal alignment
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Allow hardware to offset accesses to individual layers. Also leave
the mip-level overriding for the creator of the intel renderbuffer
to handle. Merged with "i965/gen8: Allow stencil buffers to be
configured as single sampled"
Ken: I left the "_mesa_problem()" still in place. I think it is clearer
to remove it in a separate patch.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use intel_mipmap_tree::total_width in order to get correct alignment
automatically. Also use "mt->total_height / mt->physical_depth0" as
surface height allowing hardware to offset to correct slice.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
mt->format is of type mesa_format, and therefore can't be
used with _mesa_base_fbo_format which requires a GLenum input.
On gen8, this fixes various piglit fbo-depthstencil tests with
samples > 1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
They've served their purpose (in transitioning blorp to using
fs_generator) and now they just necessitate large amounts of manual
labor to regenerate if the disassembler changes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, we no longer have multiple
definitions in the final egl_gallium.so.
v2: Drop duplicate libloader link.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com> (v1)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v1)
It makes more sense to link the core and common parts of the driver as the
target is build. Additionally this will help us drop duplicating symbols
for targets that static link mulitple pipe-drivers. Only egl-static needs
that currently with more to come.
To simplify things a bit add HAVE_GALLIUM_RADEON_COMMON variable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required. Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.
Fix loop termination condition so that this does not happen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Hardware only supports 16. Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).
Also a bunch of asserts to make problems like this easier to see.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We are starting to add integer support to the compiler, which does not
get exercised with glsl feature level 120 and without advertising
integer support. But doing so breaks too many things right now. So
for now use a debug flag to conditionally expose the functionality
while it is in development.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function. It was a pain to do so, so I've left is separated
for cleanliness.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Adds a large sum of TGSI opcodes to the a3xx compiler.
For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes
If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
All shaders had the same name.
We could probably use some identifier per shader too, but for now only use
the variant number.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the code used the total number of ubos being declared in the
linked program (so the ubos of all shaders combined), use the number
from the particular shader instead.
This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks
seen in llvmpipe since 8a9f5ecdb1 as it now emits
code for each declared buffer, not just the ones actually used.
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The big missing part here is proper sched data calculations, but
hopefully the chosen placeholder will be sufficient for now.
Passes piglit as well as GK107 does.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Maxwell no longer has the methods to set constant attributes, and we'll
want to be treating stride 0 vtxbufs the same as for stride > 0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use "@GL_LIB@" in src/gallium/targets/libgl-xlib/Makefile.am to produce
the library name specified by the configure --with-gl-lib-name option.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In 1d35f77228 support for multiple constant
buffers was introduced. This meant we had another indirection, and we did
resolve the indirection for each constant buffer access. This looks very
reasonable since llvm can figure out if it's the same pointer, however it
turns out that this can cause llvm compilation time to go through the roof
and beyond (I've seen cases in excess of factor 100, e.g. from 50 ms to more
than 10 seconds (!)), with all the additional time spent in IR optimization
passes (and in the end all of it in DominatorTree::dominate()).
I've been unable to narrow it down a bit more (only some shaders seem affected,
seemingly without much correlation to overall shader complexity or constant
usage) but it is easily avoidable by doing the buffer lookups themeselves just
once (at constant buffer declaration time).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously, ir_unop_any was implemented via a dot-product call, which
uses floating point multiplication and addition. The multiplication was
completely pointless, and the addition can just as well be done with an
or. Since we know that the inputs are booleans, they must already be in
canonical 0/~0 format, and the final SNE can also be avoided.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It wasn't completely clear from the docs, so I had to figure out by
looking at piglit results. Hopefully this saves the next driver writer
implementing queries some time.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not necessary, now that we will free the whole module (hence all
function bodies) immediately after compiling.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Free up unneeded LLVM stuff immediately after generating vertex shader
code. Saves about 500K per shader.
v2: Don't bother calling gallivm_free_function (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Split free_gallivm_state() into two steps. First step is
gallivm_free_ir() which cleans up the LLVM scaffolding used to generate
code while preserving the code itself. Second step is
gallivm_free_code() to free the memory occupied by the code.
v2: s/gallivm_teardown/gallivm_free_ir/ (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Provide a JITMemoryManager derivative which puts all generated code into
one memory pool instead of creating a new one each time code is generated.
This saves significant memory per shader as the pool size is 512K and
a small shader occupies just several K.
This memory manager also defers freeing generated code until you tell
it to do so, making it possible to destroy the LLVM engine while keeping
the code, thus enabling future memory savings.
v2: Fix compilation errors with LLVM 3.4 (Jose)
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I saw that LLVM internally uses its global context for some things, even
when we use our own. Given ours is also global, might as well use
LLVM's.
However, sepearate contexts can still be enabled with a simple source
code modification, for when the need/benefit arises.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so
its use just causes unnecessary confusion.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Older versions haven't been tested probably don't work anyway. But more
importantly, code supporting it is hindering further work.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Many instructions implicitly update the accumulator on Gen < 6. The instruction
scheduling code just calls add_barrier_deps() for each accumulator access on
these platforms, but a large class of operations don't actually update the
accumulator -- mostly move and logical instructions. Teaching the scheduling
code about this would allow more flexibility to schedule instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740
Reviewed-by: Matt Turner <mattst88@gmail.com>
Split out fd_query into an abstract base class, to allow multiple
implementations. The current sw based queries are moved into
fd_sw_query.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As far as I can tell, Mesa hasn't had a convenient way to dump ARB_vp/fp
source until now. Using MESA_GLSL=dump is convenient, since it means
you can use a single environment variable to dump a program's shaders,
no matter which language they're written in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The depth extent field is used to limit the allowed slice range that
can be rendered to.
With the previous setting, only slice 0 could be rendered.
This fixes piglit amd_vertex_shader_layer-layered-depth-texture-render.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Fixes piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If blorp is disabled for color clears, then piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
will fail.
Currently, gen8 fails similarly on this test because gen8
does not use blorp.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With the more advanced dead code elimination pass already being run,
eliminate_dead_code was making no difference in instruction count, and had
an undesirable O(n^2) runtime. So remove it and rename
eliminate_dead_code_advanced to eliminate_dead_code.
Reviewed-by: Marek Olšák <marek.olsak at amd.com>
That information misleads source code auditing tools to think that
ralloc itself is released under LGPL v3.
Instead, simply state talloc is not licensed under a permissive license.
v2: Use wording suggested by Kenneth.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We always call brw_merge_inputs() right before looping over the primitives but
this can be called inside the loop for each primitive too. In the case we do it
for the first primitive the call is redundant and can be skipped.
Reviewed-by: Eric Anholt <eric@anholt.net>
We're moving towards requiring interface additions to be appended to the
end of the interface block. No functional change, opcodes are assigned as
before, but version 2 additions are now grouped together, which prevents
a scanner warning.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Now that we aren't using pixel_[xy] in live variables, nothing is looking
at these regs after the visitor stage.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the only case where a fs_reg in brw_fs_visitor is used during
optimization/code generation, and it meant that optimizations had to be
careful to not move pixel_x/y's register number without updating it.
Additionally, it turns out we had a couple of other UW values that weren't
getting this treatment (like gl_SampleID), so this more general fix is
probably a good idea (though I wasn't able to replicate problems with
either pixel_[xy]'s values or gl_SampleID, even when telling the register
allocator to reuse registers immediately)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The value depends only on the level, so no need to store the bool per slice.
Shrinks intel_mipmap_slice from 24 bytes to 16, while slotting into an
existing hole in intel_mipmap_level.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Need to adjust coordinates since the shader receives the array index as
depth in z, but the TEX instruction expects it to be the second
coordinate for a 1D array texture. This fixes fbo-generatemipmap-array.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Previously the implication was that queries should be disabled during
blits. However glBlitFramebuffer() is supposed to obey the current
query, and this new bit will indicate that to the driver.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Different textures may be bound to each slot for each stage. So we need
to be able to upload ms parameters for each one without stages
overwriting each other.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
It's the same as setting the 3 regs separately, but shorter, and it also
seems to be required on GFX7.2 and later. This doesn't fix Hawaii.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This reverts commit e6967270c7.
Chris Forbes pointed out that this is broken for texture views which
restrict the number of slices. He committed a better fix which makes
this unnecessary.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
The information was lost during linking, causing the layout to be treated as
FRAG_DEPTH_LAYOUT_NONE.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both the ast->IR and linker have functions with this name, but different
behavior.
Rename the linker's version to var_counts_against_varying_limit to be
closer to what it is actually used for.
Suggested by Ian a while back.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The screen takes ownership of the winsys, and is responsible for
destroying it. Users of pipe-loader should make sure they destory
and screens they've created to avoid memory leaks.
This fixes a crash in clover introduced by
ce6c17c083 where the pipe-loader was
destroying the winsys while a screen was still using it.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Same issues as the previous commit fixed for Gen7:
- Bogus physical->logical layer conversion; depth/stencil surfaces
are still IMS layout on Gen8.
- mt_layer ignored in layered rendering case, which breaks handling
of views with MinLayer.
- Render target array extent not set correctly for arrays.
I'm not able to test this one since I can't get a Broadwell yet, but
it's the same set of fixes as for Gen7.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Again, a few problems:
- Layered attachments did not honor MinLayer.
- Non-layered MSAA attachments rendered to the wrong layer due to
dividing by the layer count. All depth buffers use the IMS layout, so
the physical layer count == logical layer count.
- Layered attachments were not limited to irb->layer_count, so we could
render off the end of the texture.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixing the same issues the previous commit does for Gen7.
Note that I can't test this one, since I don't have a Broadwell.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There were a few problems here, which mostly just broke layered
rendering into a view:
- Render target view extent was always set to be == depth. This is
benign for non-layered-rendering, but allows writes off the end of the
render target for layered rendering, which ends badly.
- Layered rendering did not honor the mt_layer setting, so would not
properly handle MinLayer being set on a view.
V2: Restore the MAX2() to account for zero depth/layer_count.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm somewhat impressed that current gccs will let you do this, but
sufficiently old ones (including 4.4.7 in RHEL6) won't.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
When the limit was changed to be defined in terms of LP_MAX_SHADER_VARIANTS
(75f1fea14f) when it was increased, this
inadvertently lowered the limit in some branches (that have a lower
LP_MAX_SHADER_VARIANTS number) when merged. So, make sure the limit is always
at least the number it once was.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
1c73e919a4 made it possible to not allocate
the tgsi machine if llvm was used. However, draw_get_option_use_llvm() is
not reliable after draw context creation, since drivers can explicitly
request a non-llvm draw context even if draw_get_option_use_llvm() would
return true (and softpipe does just that) which leads to crashes.
Thus use draw->llvm to determine if we're using llvm or not instead (and
make draw->llvm available even if HAVE_LLVM is false so we don't have to put
even more ifdefs).
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
1D array targets store the number of slices in the Height field.
Fixes Piglit's spec/!OpenGL 3.2/layered-rendering/clear-color-all-types
1d_array single_level, at least when used with Meta clears.
Cc: "10.2 10.1 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This only matters for TextureView where the texObj's target has not been
set yet, in all other instances, texObj->target should be the same as
the passed-in target parameter.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The old condition disallowed load propagation any time flags were
defined, even with e.g. set and a constbuf reference. The new condition
disallows it only with immediate propagation. (There are no opcodes that
set the condition flag and have an immediate argument.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add cases for PIPE_SHADER_CAP_MAX_SAMPLER_VIEWS and
PIPE_SHADER_CAP_PREFERRED_IR. Remove default switch case so we
learn of missing cases at compile time.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Return PIPE_SHADER_IR_TGSI for the PIPE_SHADER_CAP_PREFERRED_IR query.
Remove default switch case so we learn of missing switch cases at
compile time.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In an earlier incarnation of populate_consumer_input_sets and
get_matching_input, the consumer_inputs_with_locations array was indexed
using the user-specified location. In that version, only user-defined
varyings were included in the array.
In the current incarnation, the Mesa location is used to index the
array, and built-in varyings are included.
This change fixes the unit test to exepect gl_ClipDistance in the array,
and it resizes the arrays to actually be big enough. It's just dumb
luck that the existing piglit tests use small enough locations to not
stomp the stack. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78258
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Cc: Vinson Lee <vlee@freedesktop.org>
We added wglCreateContextAttribsARB but not the extension strings.
This allows creation of GL 3.x contexts.
Reviewed-by: Brian Paul <brianp@vmware.com>
This path is used to implement both glClear and glClearBuffer; the
latter is only supposed to clear particular buffers. Core Mesa provides
us that information in the buffers bitmask; we must only clear buffers
mentioned there.
To accomplish this, we save/restore the color draw buffers state, and
use glDrawBuffers to restrict drawing to the relevant buffers.
Fixes Piglit's spec/!OpenGL 3.0/clearbuffer-mixed-formats and
spec/ARB_framebuffer_object/fbo-drawbuffers-none glClearBuffer tests
for drivers using meta clears (such as Broadwell).
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77852
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77856
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used for saving/restoring the glDrawBuffers state.
For now, make sure that existing users of MESA_META_ALL don't get
the new bit, since they probably won't want it.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The majority of _mesa_meta_Clear and _mesa_meta_glsl_Clear was the same;
adding a boolean for whether to use GLSL allows us to share most of it
without polluting either path too much.
Tested for regressions by hacking i965 to always use the non-GLSL path.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes glean/texture_srgb, which hit recursive-flush prevention
assertions in vbo_exec_FlushVertices.
This probably hurts the performance of front buffer rendering, but
very few people in their right mind do front buffer rendering.
Fixes Glean's texture_srgb test.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
[olv: Use the real name provided by the patch author. Ideally this could be
moved to somewhere higher level so that we would not need to create a pipe
context to flush resources. Plus, it is not clear if flushing resources for
another context is valid.]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
If StencilSampling is enabled on the texture object, pass in an
equivalent stencil-only format.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
S8_UINT will become useful when ARB_texture_stencil8 becomes supported by
mesa. The other stencil formats are needed for ARB_stencil_texturing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes:
Program received signal SIGSEGV, Segmentation fault.
bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445
445 mask_pic->srf->tex->format);
(gdb) bt
#0 bind_samplers (comp=0x21b054, comp=0x21b054, ctx=0x211430)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:445
#1 xa_composite_prepare (ctx=0x211430, comp=comp@entry=0x21b054)
at ../../../../../src/gallium/state_trackers/xa/xa_composite.c:488
#2 0xb6f454b4 in XAPrepareComposite (op=<optimized out>, pSrcPicture=<optimized out>,
pMaskPicture=<optimized out>, pDstPicture=<optimized out>, pSrc=0x5b3ad8, pMask=0x0,
pDst=0x5923b8) at msm-exa-xa.c:533
We can't yet handle solid fill mask, so explicitly reject that, rather
than segfaulting. Otherwise DDX would need to check XA version to see
if solid fill mask were supported.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Prior to commit 8435b60a35, the region
equivalent of this function called intel_miptree_create_layout, which
set mt->target to target. With that commit, it no longer copied target.
Piglit's ext_image_dma_buf_import-sample_[xa]rgb8888 tests would then
hit an assertion failure, where image->TexObject->Target was
GL_TEXTURE_EXTERNAL_OES, and mt->target was GL_TEXTURE_2D.
Copying the target fixes this assertion failure.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The two paths are really similar, and the extra conditionals will be
dwarfed by the cost of the actual upload.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 30259856a8 moved the state packets to
table generation time, but forgot to make this change. Apparently the
performance win there was about not reemitting the table pointers on
unrelated state changes.
No performance difference on cairo on glamor (n=118).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have the stage state coming into our setup of sampler states,
it's easy to drop an identifier into it of which stage the stage_state is,
and then look up which packet to emit in a little table.
No performance difference on cairo on glamor (n=492).
v2: Don't forget to do the workaround flush on IVB.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should have happend around the time of commit 4680d23, but Keith's
DRI3 patches and my GLX_MESA_query_renderer patches crossed in the mail.
I don't have a working DRI3 setup, so I haven't been able to actually
verify this. I'm hoping that someone can piglit this for me on DRI3...
It's also unfortunate the DRI2 and DRI3 can't share more code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Use the ones provided by the compiler instead.
NOTE: External trees should be updated to not include '#include/c99'
directory directly, but rather rely on scons/gallium.py to do the right
thing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Silence insignificant warnings so significant warnings have a chance to
stand out.
The only abundant warning that's not silenced here is "C4018:
signed/unsigned mismatch", as it could hide security issues, so it's better
to actually fix the code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Drop the version/name tag from the script as it was never
meant to be there. Add swrast_create_screen as it is used
when loading swrast. Rename the file to pipe.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Both llvm and clang polute the exported symbol table, as soon
as we try to link with either one. Other than those two
everything else looks good (clean).
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Namely drop the version/name tag of the exported symbol, and
rename the filename to egl.sym.
v2: Rebase on top of the LD_NO_UNDEFINED changes.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Using export-symbols-regex is the least desirable method of restricting
the exported symbols, as is completely messes up with the symbol table.
radeon_drm_winsys_create is not needed, avoid exporting it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
In the presence of LLVM the final library exports every symbol from
the llvm namespace. Resolve this by using a version script (w/o the
version/name tag).
Considering that there are only ~25 symbols, explicitly list them
to minimize the chances of rogue symbols sneaking in.
Drop the *winsys_create functions as they were only meant for
gl-vdpau interop.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The symbol is not meant to be exported, and its presence was
only a side effect due to the missing visibility flags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The symbol is used for hardware only drivers. For swrast the
loader uses swrast_create_screen. Add VISIBILITY_CFLAGS while
we're here.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This can be called from locations that don't have a context pointer
handy. This patch also adds enough infrastructure so that the unit
tests for the GLSL compiler and the stand-alone compiler will build and
function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows them to be moved to .rodata, and allow us to be sure that they
will not be modified.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Using the existing driver hooks made for AMD_performance_monitor, implement
INTEL_performance_query functions.
v2: Whitespace changes.
v3: Whitespace changes, add a _mesa_warning()
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Like AMD_performance_monitor, this extension provides an interface for
applications (and OpenGL-based tools) to access GPU performance
counters. Since the exact performance counters available vary between
vendors and hardware generations, the extension provides an API the
application can use to get the names, types, and minimum/maximum
values of all available counters.
Applications create performance queries based on available query
types, and begin/end measurement collection. Multiple queries can be
measuring simultaneously.
v2: Whitespace changes
v3: src/mapi/glapi/gen/gl_API.xml: Also expose the functions to GLES2.
v4: Whitespace changes, static_dispatch="false" for all functions, fix
dispatch_sanity test for GLES2 functions
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was a work-around to allow linking a program with only a fragment
shader in a GLES context. Now that we have GL_EXT_separate_shader_objects
in GLES contexts, we can just use that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB, OES, then everything else. If there's ever a KHR shading language
extension, it should go between ARB and OES.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
I don't know of any applications that actually use it. Now that Mesa
supports GL_ARB_separate_shader_objects in all drivers, this extension
is just cruft.
The entrypoints for the extension remain in the XML. This is done so
that a new libGL will continue to provide dispatch support for old
drivers that try to expose this extension.
Future patches will add OpenGL ES GL_EXT_separate_shader_objects, but
that's a different thing.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With this patch, the piglit arb_separate_shader_object-dlist test
passes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be used for GL_ARB_separate_shader_objects. That extension
not only allows separable shaders to rendezvous by location, but it also
allows traditionally linked shaders to rendezvous by location. The spec
says:
36. How does the behavior of input/output interface matching differ
between separable programs and non-separable programs?
RESOLVED: The rules for matching individual variables or block
members between stages are identical for separable and
non-separable programs, with one exception -- matching variables
of different type with the same location, as discussed in issue
34, applies only to separable programs.
However, the ability to enforce matching requirements differs
between program types. In non-separable programs, both sides of
an interface are contained in the same linked program. In this
case, if the linker detects a mismatch, it will generate a link
error.
v2: Make sure consumer_inputs_with_locations is initialized when
consumer is NULL. Noticed by Chia-I.
v3: Rebase on removal of ir_variable::user_location.
v4: Replace a (stale) FINISHME with some good explanation comments from
Eric.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When linking a separable program that contains only a fragment shader,
the producer will be NULL. Similar cases will exist with geometry
shaders and, eventually, tessellation shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
While writing the link_varyings::single_interface_input test, I
discovered that populate_consumer_input_sets assumes that all shader
interface blocks have been lowered to discrete variables. Since there
is a pass that does this, it is a reasonable assumption. It was,
however, non-obvious. Make the code fail when it encounters such a
thing, and add a test to verify that behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Four initial tests:
* Create an IR list with a single input variable and verify that
variable is the only thing in the hash tables.
* Same as the previous test, but use a built-in variable
(gl_ClipDistance) with an explicit location set.
* Create an IR list with a single input variable from an interface block
and verify that variable is the only thing in the hash tables.
* Create an IR list with a single input variable and a single input
variable from an interface block. Verify that each is the only thing
in the proper hash tables.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
I want to make some changes to this code, but first I want to make some
unit tests for it... so that I can capture the pre- and
post-invariants. Pulling the code out into its own function in a
non-anonymous namespace enables that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was broken in some odd ways before. Too much state was being
saved, it was being restored in the wrong order, and in the wrong way.
The biggest problem was that the pipeline object was restored before
restoring the programs attached to the default pipeline.
Fixes a regression in the glean texgen test.
v3: Fairly significant re-write. I think it's much cleaner now, and it
avoids a bug with some meta ops that use shaders (reported by Chia-I).
v4: Check Pipeline.Current against NULL instead of Pipeline.Default.
Suggested by Chia-I.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Pull most of the guts out of _mesa_BindPipeline into a new utility
function that can be use elsewhere (e.g., meta).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't do anything with variables that have explicitly assigned
locations. This is also how built-in varyings are handled.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In February 2013 Paul unified the values used for shader stage outputs
and shader stage inputs. See commits 8a076c5f0^..eed6baf76. Since that
time, the location_base parameters are always VARYING_SLOT_VAR0.
Instead of passing that around, just hard code it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we are uisng the OpenCL 1.2 headers, applications expect all
the OpenCL 1.2 functions to be implemented.
This fixes linking errors with the piglit CL tests.
v2:
- Use c++ features
- Fix error code handling
v3:
- Move <iostream> into api/util.hpp
- Fix indentation
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Make set_ubo_binding() just update the binding, and move the code
that does validation, flushes the vertices etc. into a new
bind_uniform_buffer() function.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Document the difference between _mesa_lookup_bufferobj() and
_mesa_multi_bind_lookup_bufferobj().
v3: Don't create the buffer objects when they don't exist.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Make set_atomic_buffer_binding() just update the binding, and move
the code that does validation, flushes the vertices etc. into a new
bind_atomic_buffer() function.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is for glBindTextures(), since it doesn't change the active
texture unit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch adds functions for locking/unlocking the mutex, along with
_mesa_HashLookupLocked() and _mesa_HashInsertLocked()
that do lookups and insertions without locking the mutex.
These functions will be used by the ARB_multi_bind entry points to
avoid locking/unlocking the mutex for each binding point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The texture can only be bound to the index that corresponds to its
target, so there is no need to loop over all possible indices
for every unit and checking if the texture is bound to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() when unbinding textures,
to avoid having to loop over all the targets.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used by glBindTextures() so we don't have to look it up
for each texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We had the EGLimage structure laying around in intel_regions.h, but now
it's the only thing left in the file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Fix bad pointer on unreference (caught by Chad)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
I slipped this in in the region->pitch change from pixels to bytes, but I
don't see any reason for it any more -- the libdrm code doesn't appear to
divide pitch by a cpp.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If the stride wasn't width*cpp, we wouldn't track how much of the src is
busy, and allow a subdata into the end to proceed unsynchronized.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Note: region->width/height used to reflect the total_width/height padding
of separate stencil, though mt->total_width didn't. region->width/height
was being used in EGL images, where the padded value would have been the
wrong one, so I converted them to use rb->Width/Height.
v2: Drop debug printf that slipped in (caught by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Regions aren't refcounted safely for multithreaded applications, and
they're not terribly useful wrappers of a BO, so I'm trying to remove
them.
Even the stride I added here could probably be reduced to use of an
existing field in the __DRIimageRec, but I want this to be as mechanical
of a change as possible.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I want to do this to get the region removed from DRI images. However, it
does mean that we won't share the intel_region between the rb and the
texture for texture_from_pixmap. I think that's fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I noticed that we were doing this while changing the DRI3 path to not use
regions, which involved changing the signature of
intel_update_winsys_renderbuffer_miptree() this way.
v2: Replace my comment with Chad's version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The drm function to get the tiling is just a getter storing the two
pointers, so we don't need to go out of our way to avoid it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
All the consumers are doing it on a miptree.
v2: fix a silly duplicated dereference (review by Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> (v1)
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
In practice this makes the depth buffer Z24 even if the visual has
depthBits==16.
The incorrect depth buffer format doesn't seem to cause any actual
problems in i965, but it seems like we should fix it anyway. I see
Z16 has been more or less deprecated in the driver except the for
the depthBits==16 case. But if we want to use Z24 even in that
case (not sure it's really legal?) it would look better if the
code made that decision explicitly rather than relying on the
format to get magically overwritten by the renderbuffer code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
intel_alloc_renderbuffer_storage() will clobber rb->Format which was
already set up by intel_create_renderbuffer(). This causes the driver
to potentially create the depth buffer in the wrong format.
Long time ago things worked by accident because
_mesa_choose_tex_format() checked for ARB_depth_texture
and thus returned MESA_FORMAT_NONE on gen2 hardware. Somehow
that ended up working when depthBits==16 because the driver
would then pick DEPTH_FRMT_16_FIXED. Not sure how, but things
also seemed to work with depthBits==24.
Things started to go more sideways at:
commit 6ae473221a
Author: Eric Anholt <eric@anholt.net>
Date: Mon Apr 22 16:04:25 2013 -0700
intel: Fold the one last function intel_tex_format.c into the caller.
since that caused intel_miptree_create_layout() to divide by zero
when encoutering MESA_FORMAT_NONE (bw==0). So after this
commit things were broken enough that many applications wouldn't even
run.
Things got a bit better at:
commit c245efe7e8
Author: Eric Anholt <eric@anholt.net>
Date: Thu Mar 21 09:50:45 2013 -0700
mesa: Remove extension checking from ChooseTexFormat.
since now _mesa_choose_tex_format() would return MESA_FORMAT_X8_Z24
for GL_DEPTH_COMPONENT due to i915 erroneosly claiming that
MESA_FORMAT_X8_S24 (and others) are supported texture formats even
on gen2 hardware. So now the the div-by-zero was gone, but now the
driver would pick DEPTH_FRMT_24_FIXED_8_OTHER even when
depthBits==16 which caused rendering problems.
If we prevent rb->Format from getting clobbered for the depth buffer
things work much better. This makes the spinning title text visible
again in chromium-bsu at 16bpp, for example.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
V2: Follow the new naming convention for unpack functions.
Use double precision for converting Z24 to a float.
V3: Unpack stencil value to most significant byte.
Use 'struct z32f_x24s8' type.
V4: Unpack stencil value to least significant byte.
Add a comment to clarify stencil packing.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch contains non-functional changes. Assertion checks made
earlier in the functions make the if checks redundant. So, remove
the if checks and unindent the code in if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Depth-stencil teture targets are allowed to use source data of type
GL_UNSIGNED_INT_24_8_EXT and GL_FLOAT_32_UNSIGNED_INT_24_8_REV.
Fixes few crashes in Khronos OpenGL CTS packed_pixels tests.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Link error conditions added in previous patch are equally applicable
to GL_ARB_fragment_coord_conventions implementation. Extension's spec
says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that program
that have a static use of gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must have
the same set of qualifiers."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch causes the shader link to fail if we have multiple fragment
shaders with conflicting layout qualifiers for gl_FragCoord.
V2: Restructure the code and add conditions to correctly handle the
following case:
fragment shader 1:
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
foo();
gl_FragColor = gl_FragData;
}
fragment shader 2:
layout(pixel_center_integer) in vec4 gl_FragCoord;
void foo()
{
}
V3:
Allow linking in the following case:
fragment shader 1:
void main()
{
foo();
gl_FragColor = gl_FragCoord;
}
fragment shader 2:
in vec4 gl_FragCoord;
void foo()
{
...
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.8.1, page 39 of GLSL 1.50 spec says:
"Within any shader, the first redeclarations of gl_FragCoord
must appear before any use of gl_FragCoord."
GLSL compiler should generate an error in following case:
vec4 p = gl_FragCoord;
layout(origin_upper_left) in vec4 gl_FragCoord;
void main()
{
}
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL 1.50 spec says:
"If gl_FragCoord is redeclared in any fragment shader in a program,
it must be redeclared in all the fragment shaders in that
program that have a static use gl_FragCoord. All redeclarations of
gl_FragCoord in all fragment shaders in a single program must
have the same set of qualifiers."
This patch makes the glsl compiler to generate an error if we have a
fragment shader defined with conflicting layout qualifier declarations
for gl_FragCoord. For example:
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
layout(pixel_center_integer) in vec4 gl_FragCoord;
void main()
{
}
V2: Some code refactoring for better readability.
Add compiler error conditions for redeclarations like:
layout(origin_upper_left) in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
and
in vec4 gl_FragCoord;
layout(origin_upper_left, pixel_center_integer) in vec4 gl_FragCoord;
V3: Simplify function is_conflicting_fragcoord_redeclaration()
V4: Check for null pointer before doing strcmp(var->name, "gl_FragCoord").
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In OpenGL 3.1 attribute 0 becomes non-magic, just like in
OpenGL ES 2.0. Earlier versions of OpenGL used attribute 0
exclusively for vertex position.
V2: Add a utility function _mesa_attr_zero_aliases_vertex() in
varray.h
Fixes 4 Khronos OpenGL CTS failures:
glGetVertexAttrib
depth24_basic
depth24_precision
rgb8_rgba8_rgb
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch makes changes to the behavior of glGetAttribLocation(),
glGetFragDataLocation() and glGetFragDataIndex() functions.
Code changes handle a case described in following example:
shader program:
layout(location = 1)in vec4[4] a;
void main()
{
}
Currently, glGetAttribLocation("a") returns 1.
glGetAttribLocation("a[i]"), where i = {0, 1, 2, 3}, returns -1.
But the expected locations for array elements are: 1, 2, 3 and 4
respectively.
This clarification came up with the addition of
ARB_program_interface_query to OpenGL 4.3.
From Page 326 (page 347 of the PDF) of OpenGL 4.3 spec:
"Otherwise, the command is equivalent to
GetProgramResourceLocation(program, PROGRAM_INPUT, name);"
And, From Page 101 (page 122 of the PDF) of OpenGL 4.3 spec:
"A string provided to GetProgramResourceLocation or
GetProgramResourceLocationIndex is considered to match an active
variable if
• the string exactly matches the name of the active variable;
• if the string identifies the base name of an active array, where
the string would exactly match the name of the variable if the
suffix "[0]" were appended to the string; or
• if the string identifies an active element of the array, where
the string ends with the concatenation of the "[" character, an
integer (with no "+" sign, extra leading zeroes, or whitespace)
identifying an array element, and the "]" character, the integer
is less than the number of active elements of the array variable,
and where the string would exactly match the enumerated name of
the array if the decimal integer were replaced with zero."
V2: Simplify get_matching_index() function.
Add relevant text from OpenGL spec in commit message.
Fixes failures in Khronos OpenGL CTS tests:
explicit_attrib_location_room
draw_instanced_max_vertex_attribs
Proprietary linux drivers of NVIDIA (331.49) matches the behavior
expected by OpenGL 4.3 spec.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently overlapping locations of input variables are not allowed for all
the shader types in OpenGL and OpenGL ES.
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location is referred
to as aliasing, and is not permitted in OpenGL ES Shading Language
3.00 vertex shaders. LinkProgram will fail when this condition exists.
However, aliasing is possible in OpenGL ES Shading Language 1.00 vertex
shaders."
Taking in to account what different versions of OpenGL and OpenGL ES specs
say about aliasing:
- It is allowed only on vertex shader input attributes in OpenGL (2.0 and
above) and OpenGL ES 2.0.
- It is explictly disallowed in OpenGL ES 3.0.
Fixes Khronos CTS failing test:
explicit_attrib_location_vertex_input_aliased.test
See more details about this at below mentioned khronos bug.
V2: Fix the case where location exceeds the maximum allowed attribute
location.
V3: Simplify the condition added in V2.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: Khronos #9609
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
don't leak the MCSubtargetInfo (not really big, was already fixed with
llvm master) and TargetMachine (big). While this is only used for debugging
the leak is large enough to get you into trouble in some cases.
Tested with llvm 3.1 and master.
Before (llvm 3.1), GALLIVM_DEBUG=asm glxgears:
==14152== LEAK SUMMARY:
==14152== definitely lost: 105,228 bytes in 20 blocks
==14152== indirectly lost: 347,252 bytes in 261 blocks
==14152== possibly lost: 866,625 bytes in 1,453 blocks
==14152== still reachable: 7,344,677 bytes in 6,494 blocks
==14152== suppressed: 0 bytes in 0 blocks
After:
==13799== LEAK SUMMARY:
==13799== definitely lost: 3,108 bytes in 6 blocks
==13799== indirectly lost: 0 bytes in 0 blocks
==13799== possibly lost: 804,143 bytes in 1,429 blocks
==13799== still reachable: 7,314,267 bytes in 6,473 blocks
==13799== suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
The brw_eu_emit.c code manually forces the header present bit when
used in align1 (scalar) mode. So, this has no effect currently.
However, it is nice to have fs_inst::header_present reflect reality.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is similar to what Eric did for Gen7 a little while ago; it also
has support for untyped surface reads.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Francisco made brw_mark_surface_used a freestanding function in
commit a32817f3c2. We should use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This must not have existed when I wrote the original code. The atomic
operation header setup code uses this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For platforms using hardware contexts (currently Gen6+), we failed to
emit PIPELINE_SELECT and 3DSTATE_VF_STATISTICS, instead emitting MI_NOOP
for both.
During one of the context initialization reordering patches, we
accidentally moved brw_init_state before we set brw->CMD_PIPELINE_SELECT
and brw->CMD_VF_STATISTICS. So, when brw_init_state uploaded initial
GPU state (brw_init_state -> brw_upload_initial_gpu_state ->
brw_upload_invariant_state), these would be 0 (MI_NOOP).
Storing the commands in the context is not worthwhile. We have many
generation checks in our state upload code, and for platforms with
hardware contexts, this only gets called once per GL context anyway.
The cost is negligable, and it's easy to botch context creation
ordering.
This may fix hangs on Gen6+ when using the media pipeline.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
arekm reported that using Chrome with GPU acceleration enabled on GM45
triggered the hw_ctx != NULL assertion in brw_get_graphics_reset_status.
We definitely do not want to advertise reset notification support on
Gen4-5 systems, since it needs hardware contexts, and we never even
request a hardware context on those systems.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75723
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This was introduced with the comment and code below it, though the code
only touches prog_data (CACHE_NEW_WM_PROG).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from having to emit the nonpipelined state packet on every
FBO binding.
-4.42003% +/- 1.09961% effect on cairo-perf-trace runtime on glamor (n=110).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No more walking 96*6 pointers looking to see if they're the current
texture, when we only use the first 2 out of 96 units. -6.26002% +/-
1.87817% effect on cairo runtime on no-fbo-cache glamor (n=36).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of walking 6 shader stages for each of the 96 combined texture
image units, now we just walk the samplers used in each shader stage.
With cairo-perf-trace on Xephyr with glamor, I'm seeing a -6.50518% +/-
2.55601% effect on runtime (n=22) since the "drop _EnabledUnits" change.
No significant performance difference on an apitrace of minecraft (n=442).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to avoid walking the entire long array texture image units, but the
obvious way to do so means walking program samplers, and thus hitting the
units in a random order.
This change replaces the previous behavior of only setting up the fallback
texture for a fragment shader with setting up the fallback texture for any
shader that's missing a complete texture of the right target in its unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is kind of ugly, but I think it's worth it to finish off the last
consumers of _ReallyEnabled.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The _MaxEnabledTexImageUnit check assures us that Unit[0].Current != NULL.
This is the last consumer of _ReallyEnabled outside of the radeons.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm probably not the only person that has tried to kill _ReallyEnabled.
This does the mechanical part of the work, and cleans _ReallyEnabled from
i965.
I think that using _Current makes texture management clearer: You can't
have multiple targets in use in the same texture image unit at the same
time, because there's just that one pointer.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to try to delete _ReallyEnabled, which is this weird bitfield
with either 0 or 1 bits set with just the reference to _Current.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The field wasn't really valid, since we've got more than 32 units now. It
turns out it was mostly just used for checking != 0, or checking for fixed
function coordinates, though.
v2: Fix mis-conversion in xm_line.c (caught by Ken).
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_EnabledUnits is all of the first 32 image units that are used by fixed
function or programs, while _EnabledCoordUnits is just which fixed function
fragment shader texcoords need to be generated. This is a theoretical bugfix
in the case of a vertex shader texturing from large texture image unit number
(we'd end up flagging something other than a VARYING_SLOT_TEXn as needing to
be generated), but it's actually just motivated by trying to kill
_EnabledUnits.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original GL_EXT_texture_swizzle extensions said GL_INVALID_OPERATION
was to be generated when the an invalid swizzle was passed to
glTexParameter(). But in OpenGL 3.3 and later, the error should be
GL_INVALID_ENUM.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It is possible that there are multiple input buffers but only one is
relevant for translation. Then there will be only a single translation
group, which might need to source data from a buffer index != 0.
Fixes wrong vertex shader inputs as observed while debugging with an
application and driver combination that requires translation of a
vertex attribute in a non-trivial set of attributes and input buffers.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Igor Gnatenko:
v2: PIPE_COMPUTE_CAP_MAX_CLOCK_FREQUENCY instead of
PIPE_COMPUTE_MAX_CLOCK_FREQUENCY
Bruno Jiménez:
v3: Drivers report clock in Mhz
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Igor Gnatenko:
v2: in define RADEON_INFO_MAX_SCLK use 0x1a instead of 0x19 (upstream changes)
Bruno Jiménez:
v3: Convert the frequency to MHz from kHz after getting it in
'do_winsys_init'
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
We intended to set these 64-bit addresses to 0, and set the enable bit.
But, I accidentally placed the DWord with the high bits first, when it
should have been second.
This generally worked out, by luck - presumably General State Base
Address is initially zero, and ends up remaining that way in our
contexts since we bungled the "modify enable" bit.
v2: Fix MOCS shift on GSBA. It should be 4, and I had 2.
(Caught by Ben Widawsky.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The implementation is basically a NOP but it conforms with OpenCL 1.2.
[ Francisco Jerez: Initialize property return buffer for
CL_DEVICE_PARTITION_PROPERTIES, CL_DEVICE_PARTITION_TYPE,
CL_DEVICE_PARTITION_AFFINITY_DOMAIN, and make the latter a scalar
rather than a vector. Some clean-up and code style fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: use a new variable for aligned size
add comment
make both vars const
only use the aligned value in argument constructors
fix comment typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The C++ headers are *not* updated because they rely on CL 1.2 APIs
that we do not implement yet when the core CL 1.2 headers are present.
Acked-by: Tom Stellard <thomas.stellard@amd.com>
Fixes textureGatherOffset when used with a shadow sampler. Also verified
against blob compiler with textureLodOffset manually (no piglit tests
for texture[Lod]Offset + shadow samplers).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds support for:
IBFE, UBFE, BFI, LSB, IMSB, UMSB, BREV, POPC
Which are all required for ARB_gs5 support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add bitwise reversing and signed MSB helpers for software implementation
of the new TGSI opcodes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Also pipe through [IU]MUL_HI, MAD, and lower ldexp. This provides
coverage of all new ARB_gpu_shader5 functions except uaddCarry,
usubBorrow and interpolateAt*.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use designated initialisers, and store the extensions pointers as const.
The loader extensions __DRIdri2LoaderExtension and __DRIswrastLoaderExtension
are setup by the platform backends so they should not be constified.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Use designated initialisers, store all extension pointers as const and use
a const __DRIextensions array over assigning each element individually.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Explicitly set the version that is implemented, as that may differ from
the one defined in dri_interface.h. The remaining __DRI*Extensions are
treated as constants, so got ahead and declare them as such.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Use a const array with the extensions, rather than assigning each
one to a fixed size array at runtime.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Make sure that the DRI*Extensions report the version of the interface
implemented over the listed in the headers. While both are currently
the same, this may change in the future.
v2: Keep loader extensions handling as is.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Explicitly set the version that is implemented, as that may differ
from the one defined in dri_interface.h. Use designated initialisers
and constify whereever possible.
Note: __DRIimageExtension should not be made const as it's modified
at runtime. This patch should have no side effects on compilers that
do not support designated initialisers, as the existing code in
dri/common already uses them.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Rather than keeping a separate and unused copy of the screen extensions
within the radeon screen, use a constant array that can be used directly
with __DRIscreen.
[Kristian Høgsberg]
The copy in the radeon screen isn't unused, that's where the array is
built and stored, the dri screen just points to that. The pattern
here was used for cases where the extensions exported by a dri driver
could vary at runtime, for example depending on chipset. In this
case, it's known at compile time, so it makes sense to use a static
const array instead.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Uniformly use the typecasted extension name, constify extension instances
and use designated initialisers. Set the implemented version of the
extension, over the one defined in dri_infertace.h. Patch covers the
following extensions:
__DRItexBufferExtension
__DRIimageExtension
__DRIrobustnessExtension
__DRI2rendererQueryExtension
__DRIdri2LoaderExtension
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
With commit e59fa4c46c8("dri2: release texture image.") we updated the
extension without bumping the version number. The patch itself added an
interface required to enable texture_from_pixmap on certain platforms.
The new code was effectively never build, as it depended on
__DRI_TEX_BUFFER_VERSION >= 3, which never came to be in upstream mesa.
This commit bumps the version number, drops the __DRI_TEX_BUFFER_VERSION
checks and resolves all the build conflicts. Additionally it add a version
check as egl and dri3, as require version 2 of the extension which does
not have the releaseTexBuffer hook.
Cc: Juan Zhao <juan.j.zhao@intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unfortunately, Cygwin defines RTLD_DEFAULT (for glibc compatibility), but can't
provide dladdr(), so add a check for dladdr()
Since I don't think scons is ever used to build for Cygwin, just set HAVE_DLADDR
in SConscript, assuming that if we have RTLD_DEFAULT, we have dladdr().
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
The old python code used sys.is_big_endian to select between little-endian
and big-endian formats, which meant that the build and host endiannesses
needed to be the same. This patch instead generates both big- and little-
endian layouts, using PIPE_ARCH_BIG_ENDIAN to select between them.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Splits out the code that parses the channel list, so that we
can have different lists for little and big endian.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Rather than iterate over format.channels and format.swizzles directly,
use Python subfunctions that take the channel and swizzle lists as
arguments. This allow the channel and swizzle lists to depend on
endianness.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With the big-endian changes, there can be two swizzle orders for each format.
This patch turns Format.inv_swizzle() into a global function that takes the
swizzle list as a parameter.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The main aim is to reduce the number of places that access channels[0],
swizzles[0] and swizzles[1] directly.
There is no change to the generated u_format_table.c.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This can happen with glamor, which uses EGL_KHR_surfaceless_context and
only explicitly binds GL_READ_FRAMEBUFFER for glReadPixels.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
relnotes weren't updated this whole time, so I went through all the
GL3.txt changes and picked out the nouveau ones since 10.1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
_mesa_HashTable is not well-suited for us: it locks a mutex unnecessarily and
it does not accept 0 as the key (and have branches to handle 1 specially).
What we really need is a sparse array. Whether it should be implemented as a
hash table, a list, or a bsearch()-able array requires investigations of the
use models.
We choose to implement it as a list for now, assuming it is common to have a
short list of IDs in each (source, type) namespace. The code is simpler, and
the memory footprint is lower. This also fixes several corner cases such as
making messages to have different states at different severities.
v2: use GLbitfield for State/DefaultState, and add a comment
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Do not copy the debug group until it is about to be written. One likely
scenario of using glPushDebugGroup/glPopDebugGroup is to enclose a sequence of
GL commands and give them a human-readable description. There is no message
control change in this scenario, and thus no need to copy.
This also reduces the initial size of gl_debug_state from 306KB to 7KB.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add functions to provide these operations on a struct gl_debug_namespace:
init(): initialize the namespace
copy(): copy all elements from one namespace to another
clear(): clear all elements (to free the memories)
set(): set the value of an element
set_all(): set the value of all elements
get(): get the value of an element
A debug namespace is like a sparse array. The length of the array is huge,
2^sizeof(GLuint), but most of the elements assume the same value sepcified by
set_all().
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add struct gl_debug_group to hold all namespaces of a debug group. Replace
the 3-dimensional array, Namespaces, in struct gl_debug_state by a
1-dimensional array of type struct gl_debug_groups.
Turn the 4-dimensional array, Defaults, in struct gl_debug_state to a
1-dimensional array in struct gl_debug_namespace.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove NextMsgLength, and move members of struct gl_debug_state that belong to
the message log to a new struct, gl_debug_log. Rename gl_debug_msg to
gl_debug_message.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When GL_DEBUG_OUTPUT_SYNCHRONOUS is GL_TRUE, drivers are allowed to log debug
messages from other threads. That requires gl_debug_state to be protected by
a mutex, even when it is a context state. While we do not spawn threads in
Mesa yet, this commit makes it easier to do when we want to.
Since the definition of struct gl_debug_state is no longer needed by the rest
of the driver, move it to main/errors.c. This should make it even harder to
use the struct incorrectly.
v2: add comments for the accessors
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add validate_length, and call it together with log_msg directly instead of
message_insert. No functional change.
v2: make sure length is non-negative (i.e., known) before calling
validate_length, noted by Timothy Arceri
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In both call sites, it could be easily replaced by direct
debug_is_message_enabled calls. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Merge control_app_messages with the only caller. Eliminate set_message_state
and control_messages too as they are unused. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Replace free_errors_data by debug_clear_group. Add debug_pop_group and
debug_destroy for use in _mesa_PopDebugGroup and _mesa_free_errors_data
respectively. No funcitonal change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move group copying to debug_push_group. Save the group message before pushing
instead of after, since we will need it after popping. No functional change
otherwise.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move most of the code to debug_set_message_enable_all. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message fetching to debug_fetch_message and message deletion to
debug_delete_messages. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message logging to debug_log_message. Replace store_message_details by
debug_message_store. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move message state update to debug_set_message_enable. No functional change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the message filtering logic to debug_is_message_enabled. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Move gl_debug_state allocation to a new function, debug_create. No functional
change.
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitfieldInsert takes scalar integers for its last two arguments. Since
bitfieldInsert is lowered on i965 to two instructions that have more
flexible arguments, I didn't notice when I wrote this.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
It looks like this bit of code is trying to disable the prime capability if
the driver doesn't support createImageFromFds. However the logic looks a bit
broken and what it would actually do is disable all other capabilities apart
from prime. This patch fixes it to actually disable prime.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This should give the caller some information of what called the error.
For the gbm_bo_import() case, for instance, it is possible to know if
the import is not supported or the error was caused by an invalid
parameter.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In all fairness we allow the gallium tests to be build with --disable-dri
which will result in the approapriate winsys to not be build, thus the
build will fail.
./configure --disable-dri --with-gallium-drivers=svga --enable-gallium-tests
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than defining our own set of variables, use NEED_WINSYS_XLIB
and based on it include the sw/xlib winsys.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The function relies on the sw/dri winsys which is build only when --enable-dri
is set. Fixes build issues with the following config
./configure --disable-dri --with-gallium-drivers=svga --enable-xa
Issue can be reproduced with any hw gallium driver + st that uses the pipe-loader.
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
GL (3.0) allows you to clear individual color buffers in a fb. In fact
for fbs containing both int and float/normalized color buffers this is
required (because the clearing values are otherwise undefined if applied
to all buffers). The gallium interface was changed a while ago, but llvmpipe
ignored it (hence doing such individual clears always resulted in clearing
all buffers, plus some assorted asserts due to the mixed fbs).
So change the clear command to indicate the buffer to be cleared. Also, because
indicating the buffer to be cleared would have made lp_rast_arg_cmd larger
which is unacceptable (we're trying to shrink it some day) allocate the clear
value in the scene and just pass a pointer.
There's several advantages and disadvantages here:
+ clearing individual buffers works (we could also actually bin such clears now
if they'd come through clear_render_target() if the surface is in the current
fb, though we didn't do this before for the single rb case and still don't try).
+ since there's one clear per rb, we do the format conversion in setup rather
than per bin. Aside from the (drop in the ocean...) performance advantage this
means that clearing to very small values (that is, denormal when converted to
the format) should work for small float (fp16 etc.) formats, as the util code
couldn't handle it correctly before (because cpu denorms are disabled when
executing the bin commands, screwing up the magic conversion and flushing
the values to 0, though this was not verified).
- there's some overhead for traditional old-style clear-all MRT cases, since
there's one rast clear command per rb instead of one for all rbs.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=76976.
v2: get rid of the ugly manual memcpy stuff and just use union util_color.
This is 32 bytes instead of 16 but as the allocation is per scene we can live
with those additional 16 bytes (and the additional 128 bytes in the setup
context), which makes the code much more obvious. Suggested by Brian.
Reviewed-by: Brian Paul <brianp@vmware.com>
util_color often merely represents a collection of bytes, however it is
inconvenient if those bytes can only be accessed as floats/doubles for int
formats exceeding 32bits.
(Note that since rgba8 formats use one uint, not 4 bytes, hence the byte and
short member were left as is.)
The current version checking is wrongly refusing to create 3.3 contexts;
unsupported version are checked elsewhere; and the DRI path doesn't do
this sort of checking neither.
This enables piglit glsl 3.30 tests to run without skipping.
Reviewed-by: Brian Paul <brianp@vmware.com>
According to Roland all TGSI support is there in theory.
In practice there are a few piglit failures and crashes, as this hadn't
been tested before.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The GLX_ARB_create_context_profile spec says:
"If version 3.1 is requested, the context returned may implement
any of the following versions:
* Version 3.1. The GL_ARB_compatibility extension may or may not
be implemented, as determined by the implementation.
* The core profile of version 3.2 or greater."
Mesa does not support GL_ARB_compatibility, and there are no plans to
ever support it, therefore the only chance to honour a 3.1 context is
through core profile, i.e, the 2nd alternative from the spec.
This change does that. And with it piglit tests that require 3.1
contexts no longer skip.
Assuming there is no objection with this change, src/glx/dri_common.c
and src/gallium/state_trackers/wgl/stw_context.c should also be updated
accordingly, given they have the same logic.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Lets make draw_get_option_use_llvm function available unconditionally
and use it to avoid useless allocations when LLVM paths are active.
TGSI machine is never used when we're using LLVM.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes a segmentation fault in conform divzero.c test.
This happens when glTexImage(level, width=0, height=0) is called. We
don't allocate texture memory in that case so the ImageSlices array
was never allocated.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This prevents buffer overflow w/ llvmpipe when running piglit
bin/gl-3.2-layered-rendering-clear-color-all-types 1d_array single_level -fbo -auto
v2: Compute the framebuffer size as the minimum size, as pointed out by
Brian; compacted code; ran piglit quick test list (with no
regressions.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In cases where varying fetches are optimized away (just pass-through in
vertex shader, but unused in fragment shader) we need to calculate the
correct TOTALATTROVS based on the actual number of varyings fetched,
otherwise lockup.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
HiZ operations make the depth/render caches out of sync with the sampler
caches. We need to arrange for a TC flush to happen before the target
buffer is used by the sampler. Calling brw_render_cache_set_add_bo
makes that happen.
On previous generations, brw_blorp_exec took care of flushing the
texture cache by calling intel_batchbuffer_emit_mi_flush after doing
any rendering. If we were to use the normal drawing path, then
brw_postdraw_set_buffers_need_resolve would handle this.
On Broadwell, we don't use BLORP, and we don't emit a rectangle
primitive via the normal drawing path. The 3DSTATE_WM_HZ_OP and
PIPE_CONTROL implicitly make drawing happen. So, none of our existing
code makes this flush happen - we need to do it directly.
Fixes 11 Piglit copyteximage subtests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77223
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77226
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The way cik_num_banks() was calculating the index only makes sense for
the CIK specific macrotile mode array. For SI, we need to use the tile
mode index directly.
This happened to work most of the time because most of the SI tiling
modes use the same number of banks.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Previously this was special-cased for VS and FS; it never got updated
when geometry shaders came along. Generalize using is_varying_var() so
this won't be broken again with tessellation.
Note that there are two copies of the logic for `invariant`: It can be
present as part of a new declaration, and also as a redeclaration of an
existing variable or block member.
Fixes the four new piglits:
spec/glsl-1.50/compiler/invariant-qualifier-*.geom
Note for stable: This won't quite pick cleanly due to whitespace and
state->target -> state->stage renames. Should be straightforward
adjustments though.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 4.3.1, page 220, of OpenGL 3.3 specification explains
the error conditions for glreadPixels():
"If the format is DEPTH_STENCIL, then values are taken from
both the depth buffer and the stencil buffer. If there is
no depth buffer or if there is no stencil buffer, then the
error INVALID_OPERATION occurs. If the type parameter is
not UNSIGNED_INT_24_8 or FLOAT_32_UNSIGNED_INT_24_8_REV,
then the error INVALID_ENUM occurs."
Fixes failing Khronos CTS test packed_depth_stencil_error.test
V2: Avoid code duplication
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL 4.4 spec page 275:
"If pname is FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE, param will
contain the format of components of the specified attachment,
one of FLOAT, INT, UNSIGNED_INT, SIGNED_NORMALIZED, or
UNSIGNED_NORMALIZED for floating-point, signed integer,
unsigned integer, signed normalized fixedpoint, or unsigned
normalized fixed-point components respectively. If no data
storage or texture image has been specified for the attachment,
param will contain NONE. This query cannot be performed for a
combined depth+stencil attachment, since it does not have a
single format."
Fixes Khronos CTS test: packed_depth_stencil_parameters.test
Khronos Bug# 9170
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The '**used' pointer was pointing into the stObj->sampler_views array.
If 'free' was null, we'd realloc that array, thus making the 'used'
pointer invalid. This soon led to memory errors.
Just change the pointer to be '*used' so it points directly at the
pipe_sampler_view.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Avoid looping over 32/48/96 (!!) tex image units every draw, most of
which we don't care about.
Improves performance on everyone's favorite not-a-benchmark by 2.9% on
Haswell.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This gives us a better bound for some hot loops in the drivers than
MAX_COMBINED_TEXTURE_IMAGE_UNITS, which is ridiculously large on modern
hardware, and only getting worse as more shader stages are added.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also rework things so that if someone were to add an opcode without
adjusting the values in these arrays, there will be a compilation error.
This fixes a few quadop-related piglit regressions since commit
d5faf8e786.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to center the sample. The old code may have been correct given
the limited values of ms_x/y, but the new logic should be more
intuitive. Note that ms_x can only be 1/2 and ms_y can only be 0/1.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The 2D engine should be usable in more cases, but this fixes MS blits
between textures with the same MS settings. Otherwise a single sample is
selected to be the target texel value.
This allows other tests to work that render to a RB and then blit that
to a texture for input into a shader that uses sampler2DMS to verify it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Back when I originally wrote this code, force_sechalf was only used for
Gen4 code, so I didn't bother hooking it up. However, it's used more
generally these days. In particular, we use it for computing
gl_SamplePosition.
Fixes Piglit's spec/ARB_sample_shading/builtin-gl-sample-position tests.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77222
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously only allowed coalescing registers that interfere (i.e.,
whose live ranges overlap) if the destination register's live range was
entirely inside the source's live range. This is unnecessary -- we only
need to check for interfering writes in the intersection of their live
ranges.
total instructions in shared programs: 1639470 -> 1638453 (-0.06%)
instructions in affected programs: 84751 -> 83734 (-1.20%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than any old control flow. Muchnick's algorithm just checks for
interfering writes between the MOV and the end of the program. Handling
this when you have backward branches is hard, so don't, but there's no
reason to bail if you see forward branches.
instructions in affected programs: 4270 -> 4248 (-0.52%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were starting at the beginning of the instruction list, rather than
with the MOV instruction itself. This allows us to coalesce after
control flow.
Excluding the shaders from an unreleased title, the shader-db results:
total instructions in shared programs: 1603791 -> 1594215 (-0.60%)
instructions in affected programs: 678772 -> 669196 (-1.41%)
GAINED: 5
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And avoid rewriting other instructions unnecessarily. Removes a few
self-moves we weren't able to handle because they were components of a
large VGRF.
instructions in affected programs: 830 -> 826 (-0.48%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's a few 3-component vertex attribute formats that have no
equivalent SVGA3D_DECLTYPE_x format. Previously, we had to use
the swtnl code to handle them. This patch lets us use hwtnl for
more vertex attribute types by fetching 3-component attributes as
4-component attributes and explicitly setting the W component to 1.
This lets us handle PIPE_FORMAT_R16G16B16_SNORM/UNORM and
PIPE_FORMAT_R8G8B8_UNORM vertex attribs without using the swtnl path.
Fixes piglit normal3b3s GL_SHORT test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
There's no SVGA3D_DECLTYPE that directly corresponds to
PIPE_FORMAT_R8G8B8_SNORM. Previously, we used the swtnl fallback
path to handle this but that's slow and causes invariance issues.
Now we fetch the attribute as SVGA3D_DECLTYPE_UBYTE4N and insert
some extra VS instructions to remap the attributes from the range
[0,1] to the range[-1,1].
Fixes Sauerbraten sw fallback.
Fixes piglit normal3b3s-invariance test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Now only translate the formats once in svga_create_vertex_elements_state().
And rename the array and use the proper SVGA3dDeclType type.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This reverts commit c875d6e57a.
Conflicts:
src/gallium/drivers/svga/svga_context.c
This work-around will no longer be needed after the next patch
which properly supports signed-byte vertex attributes.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This sets up the proper execution mask for sends in SIMD16 mode.
Fixes Piglit's glsl-fs-normalmatrix, glsl-fs-uniform-array-2,
glsl-fs-uniform-array-6, and glsl-fs-uniform-array-7 on Ironlake,
which regressed when I enabled SIMD16 pull parameter support in
commit b207e88b25.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_ViewportIndex doesn't get its own varying slot. It is stored
in VARYING_SLOT_PSIZ.z. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_ViewportIndex'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gl_Layer doesn't get its own varying slot. It is stored in
VARYING_SLOT_PSIZ.y. This patch fixes the issue for both gen7
and gen8 because gen7_upload_3dstate_so_decl_list() is shared
between them.
Fixes failures in OpenGL Khronos CTS test transform_feedback_builtins.
Makes new piglit test glsl-1.50-transform-feedback-builtins pass for
'gl_Layer'.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that _debug_assert_fail() has the noreturn attribute, it is better
that execution truly never returns. Not just for sake of silencing the
warning, but because the code at the return IP address may be invalid or
lead to inconsistent results.
This removes support for the GALLIUM_ABORT_ON_ASSERT debugging
environment variable, but between the usefulness of
GALLIUM_ABORT_ON_ASSERT and better static code analysis I think better
static code analysis wins.
Reviewed-by: Brian Paul <brianp@vmware.com>
Same intent as commit a45a50a482,
but this the C compiler is detected via C-preprocessor macros,
similar to how autotools do it, as that seems to be the most
reliable method.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows the following shader code to work without a weird crash:
struct Foo {
int value[1];
};
int actual_value = Foo[2](Foo(int[1](100)), Foo(int[1](200)))[i].value[0];
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
nouveau_vp3_inter_sizes requires sliec_count as argument just
as the other places that call it from h264 code do. Hopefully
fixes something.
Fix the status_vp code to allow status == 0 too, when processing
hasn't started yet.
set h264->second_field correctly.
Otherwise it will trick the gallium driver into thinking that the render
target has actually changed (due to different pipe_surface pointing to
same underlying pipe_resource). This is really badness for tiling GPUs
like adreno.
This also appears to fix a rendering error with Motif on vmwgfx.
Why that is is still under investigation.
Based on an idea by Rob Clark.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Keep track of the maximal bounds of all the operations and set scissor
accordingly. For tiling GPU's this can be a big win by reducing the
memory bandwidth spent moving pixels from system memory to tile buffer
and back.
You could imagine being more sophisticated and splitting up disjoint
operations. But this simplistic approach is good enough for the common
cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Once the relevant branch has been identified do not iterate over the
instructions in the branch, do a linked list insertion instead to avoid the
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes piglit's fbo-blit-stretch test on drivers which use the meta path.
(i965: should fix Broadwell, but also fixes Sandybridge/Ivybridge/Haswell
since this test falls off the blorp path now due to format conversion)
V2: Use scissor instead of just mangling the rects, to avoid texcoord
rounding problems. (Thanks Marek)
V3: Rebase on Eric's CTSI meta changes; re-add _mesa_update_state in the
CTSI path so that _mesa_clip_blit sees the correct bounds.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77414
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the spec:
<renderbuffertarget> must be RENDERBUFFER and <renderbuffer>
should be set to the name of the renderbuffer object to be
attached to the framebuffer. <renderbuffer> must be either
zero or the name of an existing renderbuffer object of type
<renderbuffertarget>, otherwise an INVALID_OPERATION error is
generated.
This patch changes the previous returned GL_INVALID_VALUE to
GL_INVALID_OPERATION.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76894
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to emit ADD/MUL/MAC instead of MUL/ADD/MUL/ADD,
saving one instruction and two temporary registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This allows us to generate the MAC (multiply-accumulate) instruction,
which can be used to implement some expressions in fewer instructions
than doing a series of MUL and ADDs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Our hardware has an "accumulator" register, which can be used to store
intermediate results across multiple instructions. Many instructions
can implicitly write a value to the accumulator in addition to their
normal destination register. This is enabled by the "AccWrEn" flag.
This patch introduces a new flag, inst->writes_accumulator, which
allows us to express the AccWrEn notion in the IR. It also creates a
n ALU2_ACC macro to easily define emitters for instructions that
implicitly write the accumulator.
Previously, we only supported implicit accumulator writes from the
ADDC, SUBB, and MACH instructions. We always enabled them on those
instructions, and left them disabled for other instructions.
To take advantage of the MAC (multiply-accumulate) instruction, we
need to be able to set AccWrEn on other types of instructions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
OpenGL 4.0 spec, page 306 suggests an INVALID_OPERATION in glGetTexImage
if :
"format is one of the integer formats in table 3.3 and the internal
format of the texture image is not integer, or format is not one of
the integer formats in table 3.3 and the internal format is integer."
V2: Use helper function _mesa_is_format_integer()
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
mesa currently returns 4 when GL_VERTEX_ATTRIB_ARRAY_SIZE is queried
for a vertex array initially set up with size=GL_BGRA. This patch
makes changes to return size=GL_BGRA as required by the spec.
Fixes Khronos OpenGL CTS test: vertex_array_bgra_basic.test
V2: Use array->Format instead of adding a new variable
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
For graphics, the LLVM compiler backend currently has many shortcomings
compared to the non-LLVM one. E.g. it can't handle geometry shaders yet,
but that's just the tip of the iceberg.
So building Mesa with --enable-r600-llvm-compiler is currently not
recommended for anyone who doesn't want to work on fixing those issues.
However, for protection of users who end up enabling it anyway for some
reason, let's disable the LLVM backend at runtime by default. It can be
enabled with the environment variable R600_DEBUG=llvm.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This reverts commit a45a50a482.
Unfortunately gcc dumps argv[0] as the first word of --version, so it is
unreliable for detecting gcc.
In particular `cc --version` and `i686-w64-mingw32-gcc --version` give
wrong results.
A better solution needs to be found -- most likely using C-preprocessing
like autotools does. Revert for now.
For Clang static code analyzer, the scan-build script will produce more
comprehensive output. Nevertheless you can invoke it as
CC=clang CXX=clang++ scons analyze=1
For MSVC this is the best way to use its static code analysis. Simply
invoke as
scons analyze=1
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we can have name space collisions between blocks that define the same
fields. For example:
in block
{
vec4 Color;
} In[];
out block
{
vec4 Color;
} Out;
These two blocks will assign the same interface name (block.Color) to the Color
field in flatten_named_interface_blocks_declarations.cpp, leading to havoc.
This was breaking badly the gl-320-primitive-shading test from ogl-samples.
The patch uses the block instance name to avoid collisions, producing names
like block.In.Color and block.Out.Color to avoid the name clash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76394
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The transfer staging texture is always freshly allocated, so for write-only
transfers we don't need to explicitly wait for the BO to become idle.
Squeezes a few hundered MB/s more out of x11perf -shmput500 with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This manifested as rendering failures or sometimes GPU hangs in
compositors when they accidentally got MSAA visuals due to a bug in the X
Server. Today we decided that the problem in compositors was equivalent
to a corruption bug we'd noticed recently in resizing MSAA-visual
glxgears, and debugging got a lot easier.
When we allocate our MCS MT, libdrm takes the size we request, aligns it
to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it.
When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
data gets stored to the first 20 or so scanlines of each of the 3 tiled
pages in that row, even though only 2 of those pages were allocated by
libdrm. In the glxgears case, the missing 3rd page happened to
consistently be the static VBO that got mapped right after the first MCS
allocation, so corruption only appeared once window resize made us throw
out the old MCS and then allocate the same BO to back the new MCS.
Instead, just memset the amount of data we actually asked libdrm to
allocate for, which will be smaller (more efficient) and not overrun.
Thanks go to Kenneth for doing most of the hard debugging to eliminate a
lot of the search space for the bug.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have any piglit tests for this currently.
v2: Use vec3s for the texcoords so it has some hope of working.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
You'll note from the previous commits that there's something of a loop
here: You call CTSI, which calls BlitFB, then if things go wrong that
falls back to CTSI. As a result, meta CTSI reaches over into blitfb to
tell it "no, don't try that fallback".
v2: Drop the _mesa_update_state(), which was only necessary due to use of
_mesa_clip_blit() in _mesa_meta_BlitFramebuffer() in another patch
series.
v3: Drop an _EXT suffix I copy-and-pasted.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I added support to bind_fbo_image in the process of building meta
CopyTexSubImage, and found that it broke generatemipmap because previously
we would just throw a GL error there and then end up with an incomplete
FBO and fallback.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I need to do the same code again for CopyTexSubImage().
v2: Drop incorrect, not-terribly-useful comment (review by Ken)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids a ReadPixels() if there's accelerated CopyTexImage present.
It now requires GLSL as opposed to just fragment programs, but we don't
have any drivers that do ARB_fp but not GLSL.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There shouldn't be anything special about copying out a subset of the src
rb to a temp before texturing from it, so just do it when we're figuring
out our src texture binding.
This drops Anuj's change to copy an extra border of 1 pixel around the src
area. I can't see how that change could be valid, and presumably if
there's some filtering problem at edges we just need to set the right
wrap mode.
v2: Don't fall back to swrast on non-2D/RECT/2D_MS textures when we can
still CopyTexSubImage. Fixes a segfault regression on i965 with
gl-3.2-layered-rendering-blit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
I think we can assert that renderbuffer size is <= maximum 2D texture
size. Our source coordinates should have already been clipped to the src
renderbuffer size, but haven't actually (so we could potentially have
trouble if there's scaling, and we're in the CopyTexImage path that tries
to use src size). However, this texture size dependency was blocking the
next refactors, so I'm not sure if we want to go ahead with this series
before we get the clipping sorted out or not.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Putting NoDDClr and NoDDChk dependency control on instruction
sequences that include math opcodes can cause corruption of channels.
Treat math opcodes like send opcodes and suppress dependency hinting.
Signed-off-by: Mike Stroyan <mike@LunarG.com>
Tested-by: Tony Bertapelli <anthony.p.bertapelli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
total instructions in shared programs: 1653399 -> 1651790 (-0.10%)
instructions in affected programs: 92157 -> 90548 (-1.75%)
GAINED: 2
LOST: 2
Also significantly reduces the number of optimization loop iterations:
total loop iterations in shared programs: 39724 -> 31651 (-20.32%)
loop iterations in affected programs: 21617 -> 13544 (-37.35%)
Including some great pathological cases, like 29 -> 3 in Strike Suit
Zero and 24 -> 3 in Dota2.
Reviewed-by: Eric Anholt <eric@anholt.net>
We previously stopped searching for unread writes after encountering
control flow, but we can instead just search backwards until we hit
control flow.
instructions in affected programs: 22854 -> 22194 (-2.89%)
The generator uses its destination as a source implicitly, which breaks
some assumptions in dead code elimination. Giving the instruction a
source allows us to reason about it better.
We originally thought that GL 3.0 required GL_DEPTH_COMPONENT16 to map
exactly to Z16. However, we misread the specification, thanks in part
to LaTeX reordering the tables in the PDF.
Page 180 of the GL 3.0 specification (glspec30.20080923.pdf) says:
"[...] memory allocation per texture component is assigned by the GL to
match the allocations listed in tables 3.16-3.18 as closely as possible.
[...]
Required Texture Formats
[...]
In addition, implementations are required to support the following sized
internal formats. Requesting one of these internal formats for any
texture type will allocate exactly the internal component sizes and
types shown for that format in tables 3.16-3.17:"
Notably, however, GL_DEPTH_COMPONENT16 does /not/ appear in table 3.16
or table 3.17. It appears in table 3.18, where the "exact" rule doesn't
apply, and it falls back to the "closely as possible" rule.
The confusing part is that the ordering of the tables in the PDF is:
Table 3.16 (pages 182-184)
Table 3.18 (bottom of page 184 to top of 185)
Table 3.17 (page 185)
Presumably, people saw table 3.16, then saw the table immediately
following with DEPTH_COMPONENT* formats, and assumed it was 3.17.
Based on a patch by Chia-I Wu, but without the driconf option to force
Z16 to be used. It's not required, and there's apparently no benefit
to actually using it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
We've learned a few things since we originally disabled Z16; this attempts
to summarize the issue. I am no expert on this subject, though, so the
comment may not be totally accurate.
I did some benchmarking on GM45 and Ironlake, and discovered that for
GLBenchmark 2.7 EgyptHD, using Z16 was 3% slower on GM45 (n=15), and
4.5% slower on Ironlake (n=95). So, we can drop the "on Ivybridge"
aspect of the comment - it's always slower.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Significantly reduces BO allocation / destruction overhead for transfers,
e.g. measurable via x11perf -shm{ge,pu}t* with glamor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The rules are about to get a bit more complex to account for
gl_InstanceID and gl_VertexID, which are system values.
Extracting this first avoids introducing duplication.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
glClearBuffer() is currently clearing all active draw color buffers (all
buffers that have not been set to GL_NONE when calling glDrawBuffers) instead
of only clearing the one it receives as parameter. Altough brw_clear()
receives a bit mask indicating the color buffers that should be cleared,
this mask is ignored when calling brw_blorp_clear_color().
This was breaking the 'fbo-drawbuffers-none glClearBuffer' piglit test.
The patch provides the bit mask to brw_blorp_clear_color() so it can limit
clearing to the color buffers present in the mask.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76832
Reviewed-by: Eric Anholt <eric@anholt.net>
This always looks crazy when I stumble across it, until I remember
what the hardware is doing. Describing it ought to short-circuit
that process next time :)
V2: Fix indents to 6 spaces, not 7.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Many shaders use a pattern such as:
for (int i = 0; i < NUM_LIGHTS; i++) {
...access a uniform array, or shader input/output array...
}
where NUM_LIGHTS is a small constant (such as 2, 4, or 8).
The expectation is that the compiler will unroll those loops, turning
the array access into constant indexing, which is more efficient, and
which may enable array splitting and other optimizations.
In many cases, our heuristic fails - either there's another tiny nested
loop inside, or the estimated number of instructions is just barely
beyond the threshold. So, we fail to unroll the loop, leaving the
variable indexing in place.
Drivers which don't support the particular flavor of variable indexing
will call lower_variable_index_to_cond_assign(), which generates piles
and piles of immensely inefficient code. We'd like to avoid generating
that.
This patch detects unsupported forms of variable-indexing in loops, where
the array index is a loop induction variable. In that case, it bypasses
the loop-too-large heuristic and forces unrolling.
Improves performance in various microbenchmarks: Gl32PSBump8 by 47%,
Gl32ShMapVsm by 80%, and Gl32ShMapPcf by 27%. No changes in shader-db.
v2: Check ir->array for being an array or matrix, rather than the
ir_dereference_array itself.
v3: Fix and expand statistics in commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The "fail" flag is set if loop_unroll_count encounters a nested loop;
calling the flag "nested_loop" is a bit clearer.
The original reasoning was that count is inaccurate (too small) if there
are nested loops, as we don't do any sort of analysis on the inner loop.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Loop unrolling will need to know a few more options in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we pass in gl_shader_compiler_options, it makes sense to just
use options->MaxUnrollIterations, rather than passing a separate
parameter.
Half of the invocations already passed options->MaxUnrollIterations,
while the other half passed in a hardcoded value of 32.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will prevent the two from getting out of sync again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These were out of sync with the flags used to control
lower_variable_index_to_cond_assign in brw_shader.cpp.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Not setting this would prevented coalescing after a failed attempt if
the sources for both MOVs were the same.
total instructions in shared programs: 1654531 -> 1650224 (-0.26%)
instructions in affected programs: 423167 -> 418860 (-1.02%)
GAINED: 2
LOST: 0
Reviewed-by: Eric Anholt <eric@anholt.net>
It's just the array index, so we can just go look at the array and see
which element we are.
No significant performance difference (n=140)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering assignment expressions like:
v.x += u.x;
v.x += u.x;
the vectorizer would incorrectly keep going, attempting to find more
instructions to vectorize. It would overwrite the saved assignment
to point at the second one, and increment channels a second time,
resulting in try_vectorize thinking the expression was a vec2 instead of
a float.
Instead, if we see a repeated assignment to a channel, just try to
vectorize everything we've found so far. This clears the saved state
so it will start over.
Fixes Piglit's repeated-channel-assignments.vert.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For blocks, gl_shader_program::UniformStorage isn't very useful. The
names stored there are the names of the elements of the block, so
finding blocks with an instance name is hard. There is also only one
entry in ::UniformStorage for each element of a block array, and that is
a deal breaker.
Using ::UniformBlocks is what _mesa_GetUniformBlockIndex does. I
contemplated sharing code between set_block_binding and
_mesa_GetUniformBlockIndex, but building the stand-alone compiler and
the unit tests make this hard. I plan to return to this effort shortly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
In the next patch, we'll see that using
gl_shader_program::UniformStorage is not correct for uniform blocks.
That means we can't use ::UniformStorage to select between the sampler
path and the block path. Instead we want to just use the type of the
variable. That's never passed to set_uniform_binding, and it's easier
to just remove the function (especially for later patches in the series)
than to add another parameter.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76323
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Cc: github@socker.lepus.uberspace.de
And remove nonsensical approximation of linear interpolation behavior
for shadow samplers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
These were missed/typo'd in the previous patch series:
s/R8G8B8A/R8G8B8A8/
s/rgba_16/RGBA_UNORM16/
s/rgba_uint/RGBA_UINT/
s/rgba_int/RGBA_SINT/
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The commit 3b0b44f7de introduced a build
error:
error: dereferencing pointer to incomplete type
This patch fixes this issue in all the affected files.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Keep tasks as linked list, this way we can associate
more than one encoding task with each buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
It was always getting set to -8/7 unconditionally. Use the
driver-reported value instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Defaults to providing the same offsets as MIN/MAX_TEXEL_OFFSET. For
nvc0, the offset can be -32/31.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Create the screen in the winsys while the mutex is locked.
This also results in a nice code cleanup!
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This also hides the reference count from drivers.
v2: update the reference count while the mutex is locked in winsys_create
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This replaces u_gen_mipmap with an extremely simple implementation based
on pipe->blit. st/mesa is also cleaned up.
Pros:
- less code
- correct mipmap generation for NPOT 3D textures (u_blitter uses a better
formula)
- queries are not affected by mipmap generation if drivers disable them
v2: add "first_layer", "last_layer" parameters, drop "face"
v2.1: add format
v2.2: document the format parameter
This is needed by _mesa_generate_mipmap.
This adds an array of pipe_transfers to st_texture_image. Each transfer is
for mapping a single layer.
v2: allocate the array of transfers on demand
No longer used anywhere. These also caused trouble in the Gallium
state tracker code where we include both core Mesa and Gallium util
headers (and the macros were defined differently in each world.)
Removing these macros should help avoid macro mix-ups in the future.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We moved away from MALLOC/FREE in the rest of core Mesa a while ago.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were using REALLOC() from u_memory.h but FREE() from imports.h.
This mismatch caused us to trash the heap on Windows after we
deleted a texture object.
This fixes a regression from commit 6c59be7776.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This extension is a huge grab-bag of "stuff that's in DX11". Break it
apart to make it clear what still needs to be done.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the effective layer count, for clears etc. This differs from the
depth of the miptree level when views are involved.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We'll still avoid MinLayer here since the fast path doesn't understand
arrays at all, but it's straightforward to do levels.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This allows core mesa's TexSubImage paths etc to work correctly
with views which have nonzero MinLevel or MinLayer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is the actual mesa_format to use. In non-view cases this is always
the same as the mt's format.
V4: Comment style
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
We need to wire the original texture's mt into the view. All the hard
work of setting up an appropriate tree of gl_texture_image structures
has already been done by core mesa.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
If we were to relayout the miptree, we'd break any views that are
sharing it.
(Simplified based on suggestions from Eric)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
These formats can be cast to others (with different component types or
sizes) via ARB_texture_view or ARB_shader_image_load_store. We want
them to be laid out consistently so that we can just reinterpret the
memory with a different format.
In V1, this was done conditionally on a 'prefer_no_swizzle' flag which
was set in TexStorage/TextureView paths, but we need the same behavior
for ARB_shader_image_load_store (which also works with images created
via TexImage, so we don't want it to be conditional.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
The sampler can handle R8G8B8X8 (and substitute 1.0 for the fourth
component) but we can't use it as a render target.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
None of the other 3-component 16bpc formats are directly supported, so
they get promoted to XRGB equivalents. *Not* promoting RGB16F the same
way makes texture views much more fiddly -- we don't want to have to do
crazy copying behind the scenes.
(with my other master + my experimental ARB_texture_view support) fixes
the piglit test: `spec/ARB_texture_view/view compare 48bit formats`
No regressions in gpu.tests on Haswell.
V4: Don't alter the formats table -- just don't match it to a mesa_format. [Kenneth]
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is supported by all generations, and is required for memory layout
consistency for texture_view.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Previously, we would unpack the texels to floats using *_TO_FLOAT_TEX,
and then pack them into the desired format using FLOAT_TO_*. Unfortunately,
this isn't quite the inverse operation, and so some texel values would
end up off-by-one.
This fixes the GL_RGB8_SNORM and GL_RGB16_SNORM subcases in piglit's
arb_texture_view-format-consistency-get test on i965. The similar 1-, 2-
and 4-component cases already worked because they took the memcpy path
rather than repacking.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
While linux uses .so as a default extension for shared libraries that is
not the case for other platforms. The loader in libGL (and others) assumes
that the dri module will always have a .so extension, thus it will fail
to load on the affected platforms.
Spotted-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Filenames passed to dlopen() don't need to use the platform's default extension
for shared libraries.
Using the '.so' extension when dlopen()ing DRI drivers is hardcoded into mesa
and the X server, so it should be hardcoded here in the Makefile as well.
A similar fix is probably also needed for gallium DRI drivers.
(Consider that if we were starting from scratch, perhaps we would use a custom
extension like .dri instead)
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 61bedc3d6b.
As the header is the one defining the API/ABI and is distributed
during installation, we should be using it rather than re-defining
the XA version in configure.ac.
Bump the version in the header to 2.2.0, to reflect what was the
original intent of commit 42158926c6.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
With commit 1f1928db001(glx: Drop _Xglobal_lock while we create and
initialize glx display) we've split the big _Xglobal_lock handling in
a more fine grained manner.
Unfortunatelly we forgot to drop the unlock_mutex on the error paths,
leading to undefined behaviour as the mutex is already unlocked.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We hit this assert with some piglit tests. Which appears to be a bug
outside of freedreno. Previously we were relying on assert() being
redefined to debug_assert() so that we didn't crash in release builds.
Somehow that stopped working. So just use debug_assert() directly.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes a crash in svga_context_flush_buffers() if we use the 'draw' module
for AA lines (when the device doesn't support that feature). We need to
initialize this list before we setup the swtnl pieces.
Found/fixed by Charmaine Lee.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The "new" fragment shader backend has never supported the necessary
color conversion code for this to work. We began using the new backend
in Mesa 7.10 for GLSL (commit a81d423d93, October 2010),
and for ARB_fragment_program in Mesa 9.1 (commit 97615b2d8c,
August 2012).
I haven't heard any complaints, so I don't think anyone will miss this
feature. I believe mplayer used it at one point, but these days
defaults to other paths anyway.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
create_mov() was fixed up to handle neg/abs properly for interal mov's,
using absneg.f, but forgot to fix it for TGSI MOV's. The problem with
using add.f to handle negated mov's is that we can only take a single
const reg src. So:
MOV TEMP[n], -CONST[m]
would turn into:
add.f Rdst, (neg)CONST[m], 0.0
which would not work. Anyways, just remove the extra code and always
use create_mov() which DTRT.
This fixes piglit vs-op-neg-int test.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Applications frequently clear to colors other than 0.0 or 1.0, which
prevents us from doing fast color clears. In that case, we issue this
performance warning on basically every glClear call, resulting in so
much spam that it's nearly impossible to see any other messages.
Plus, I don't think it's useful. We aren't suggesting a better way to
do what the application developers want---we're just telling them it
would be faster to do something they don't want.
Driver developers have no control over the clear color, so this message
is totally useless to them.
A better alternative to get this sort of information is to use
INTEL_DEBUG=blorp, which tells you whether color clears were fast,
simd16 repdata, or slow.
v2: Rebase on has_color_component changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Keep track of whether we actually have any sam instructions in the
resulting shader, rather than using TGSI SAMP declarations. If the sam
instruction is optimized out, because the result is not used, we don't
want to emit texture state, etc. In fact emitting sampler state and/or
setting PIXLODENABLE bit when there are no texture fetches seems to
cause lockup.
In theory this should never happen for a "normal" shader, unless the
state tracker is wonky. But it is a very real possibility for binning
pass shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Decompressing ETC2 textures was causing intermitent segfault
by copying resulting 4x4 texel block to the destination texture
regardless of the size of the destination texture. Issue found
via application crash in GLBenchmark 3.0's Manhattan test.
v2: add more detail comment. Compute limit outside inner loops.
v3: add bugzilla reference
v4: Correct cc syntax in commit log
v5: really grab the right patch
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74988
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1, suggested v2-3]
v2: move allocation to a function as first step
to clean vid_enc_EncodeFrame
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
For TEX instructions, the set of samplers and sampler views should
be consistent. The XA state tracker sometimes passes an inconsistent
set of samplers and sampler views. Rather than assert and die, issue
a warning.
v2: add debugging code to detect inconsistent state.
v3: also check for null sampler in svga_state_tss.c
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Given
mov vgrf7, vgrf9.xyxz
add vgrf9.xyz, vgrf4.xyzw, vgrf5.xyzw
add vgrf10.x, vgrf6.xyzw, vgrf7.wwww
the last instruction would be wrongly changed to
add vgrf10.x, vgrf6.xyzw, vgrf9.zzzz
during copy propagation.
The issue is that when deciding if a record should be cleared, the old code
checked for
inst->dst.writemask & (1 << ch)
instead of
inst->dst.writemask & (1 << BRW_GET_SWZ(src->swizzle, ch))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Signed-off-by: Chia-I Wu <olv@lunarg.com>
Cc: Jordan Justen <jljusten@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romainck <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.1" <mesa-stable@freedesktop.org>
Nothing bad came of this because they weren't used after visitor running,
but leaving them in a bad state seems like a recipe for pain later.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
When you've got a simple solid-color shader that doesn't generate
pixel_x/y interpolation, we were deciding that the first vgrf was both the
undefined pixel_x and pixel_y, and extending its live interval to avoid
the stride problem. That tricked other optimization that tries to see if
a particular instruction is the last use of a variable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We stopped doing variable index lowering for uniforms in
a64c1eb9b1, 5 months after the comment was
added.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While we wish our optimization passes could identify all the cases where
we can coalesce our variables, we miss out on a lot of opportunities.
total instructions in shared programs: 1673849 -> 1673166 (-0.04%)
instructions in affected programs: 299521 -> 298838 (-0.23%)
GAINED: 7
LOST: 0
Note that many programs are "hurt". The notable ones are where we produce
unrolling in cases we didn't before (presumably just because of the lower
instruction count). But there are also some cases where pushing things
right into the variables prevents copy propagation and tree grafting,
since we don't split our variable usage webs apart.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
is_power_of_two() is now provided by mesa so its definition must be removed
from the i915 driver code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch will introduce an optimization that only works when
integers are not represented as floating point values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The next few patches will introduce an optimization that only works when
integers are not represented as floating point values.
v2: Re-word-wrap a line, as requested by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
ir_binop_ubo_load takes unsigned integer operands. However, the array
index used to compute these offsets may be a signed integer. (For
example, see Piglit's spec/glsl-1.40/uniform_buffer/fs-bvec-array).
For some reason, we were missing an ir_binop_i2u cast, and ir_validator
was failing to catch that.
Without this change, ir_builder's type inference code broke for me when
writing a new optimization pass.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The vector backend already implemented this optimization, but
surprisingly, we never bothered to implement it in the scalar backend.
In addition to saving two instructions, this eliminates a use of the
accumulator as an explicit source, which is unsupported in SIMD16 mode
on Gen7+, which could help us gain SIMD16 programs.
Cuts 19.23% of the instructions in dolphin/efb2ram.shader_test.
v2: Rebase on is_16bit_integer_constant -> is_uint16_constant rename.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The i965 MUL instruction doesn't natively support 32-bit by 32-bit
integer multiplication; additional instructions (MACH/MOV) are required.
However, we can avoid those if we know one of the operands can be
represented in 16 bits or less. The vector backend's is_16bit_constant
static helper function checks for this.
We want to be able to use it in the scalar backend as well, which means
moving the function to a more generally-usable location. Since it isn't
i965 specific, I decided to make it an ir_constant method, in case it
ends up being useful to other people as well.
v2: Rename from is_16bit_integer_constant to is_uint16_constant, as
suggested by Ilia Mirkin. Update comments to clarify that it does
apply to both int and uint types, as long as the value is
non-negative and fits in 16-bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Performance warnings are logged via KHR_debug in addition to when the
INTEL_DEBUG=perf environment variable is set. Without this, messages in
debug contexts would have "(null)" for the reason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These are clearly needed---the comments in the function are even present
for each one of them. I originally had two separate state atoms for
3DSTATE_SBE and 3DSTATE_SBE_SWIZ. When I combined the functions, I must
have forgotten to add the atoms for 3DSTATE_SBE_SWIZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing actually uses this---we handle rasterizer discard in the
clipper in order for statistics counters to work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
renderer_copy_prepare was setting the first sampler but never telling
the cso code how many samplers were actually used. Fix this.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Binding a new destination may cause the svga driver to emit draw calls
while propagating the surface. Make sure this doesn't happen in the middle
of sampler state setup where state may be incosistent.
In practice, surface propagation should never happen here and even if it did,
it wouldn't be a valid reason for the svga driver to emit partially set up
state, but to avoid future uncertainties, make sure this doesn't happen
anyway.
Found while auditing the state tracker for inconsistent sampler state /
sampler view setup.
Cc: "10.1" <mesa-stable@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
I think this was used for coalescing out partly dead large virtual
registers, but the patch that enabled that caused regressions and didn't
make it upstream.
Reviewed-by: Eric Anholt <eric@anholt.net>
It's more likely that we won't find writes to all channels than one will
interfere, and calculating interference is more expensive. This change
will also help prepare for coalescing load_payload instructions'
operands.
Also update the live intervals for all channels, and not just the last
that we saw.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intent in 9b6b084eb7 was
for urb .size and .min_vs_entries fields to use the values
from the GEN8_FEATURES macro.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rename functions to match format names.
sed commands:
s/signed_rgba8888_rev/R8G8B8A8_SNORM/g
s/signed_rgba8888/A8B8G8R8_SNORM/g
s/f_rgba8888_rev/R8G8B8A_UNORM/g
s/f_rgba8888/A8B8G8R8_UNORM/g
s/f_rgbx8888_rev/R8G8B8X8_UNORM/g
s/f_rgbx8888/X8B8G8R8_UNORM/g
s/f_argb8888_rev/A8R8G8B8_UNORM/g
s/f_argb8888/B8G8R8A8_UNORM/g
s/f_xrgb8888_rev/X8R8G8B8_UNORM/g
s/f_xrgb8888/B8G8R8X8_UNORM/g
s/signed_rgbx8888/X8B8G8R8_SNORM/g
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Implement guest-backed surface sharing using prime fds. Previously only
legacy surfaces could use this functionality. Also use the vmwgfx 2.6
single-ioctl prime fd reference if available.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This opcode provide support for GL_ARB_texture_query_lod,
Signed-off-by: Dave Airlie <airlied@redhat.com>
[imirkin: rebase, docs update]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cuts a small handful of instructions in Serious Sam 3:
instructions in affected programs: 4692 -> 4666 (-0.55%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Which would lead to translating
mad vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov.sat vgrf7:F, -vgrf9:F
into
mad.sat vgrf9:F, vgrf3:F, u0:F, vgrf6:F
mov vgrf7:F, -vgrf9:F
Fixes some lighting effects in Dota2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76749
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ip needs to be initialized to start_ip - 1, since the first thing in the
main loop is ip++. Otherwise we would incorrectly propagate the saturate
from the mov to the mad:
mad a, b, c, d
mov.sat x, a
add y, z, a
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we'd ignore the sources of instructions with non-GRF
destinations when calculating calculating the dead channels. This would
lead to us incorrectly removing the first instruction in this sequence:
mov vgrf11, ...
cmp.ne.f0 null, vgrf11, 1.0
mov vgrf11, ...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76616
Or else poor programmers might mistakenly use the temporary mem_ctx,
instead of the fs_visitor's mem_ctx and wonder why their code is
crashing.
Also remove the parenting. These contexts are local to the optimization
passes they're in and are freed at the end.
Otherwise calling dump_instructions() after declaring a new fs_reg would
segfault when calculate_register_pressure()'s loop over reg walked off
the end of the virtual_grf_start[] array that calculate_live_intervals()
would have reallocated for you, if it had known there was a new
register.
Don't hardcode /dev/dri/card0 but instead use the drm
macros which allows the correct /dev/drm0 device to be
opened on OpenBSD.
v2: use snprintf and fallback to /dev/dri/card0
v3: check for snprintf truncation
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After the loader changes libudev is no longer required for
gbm or the egl drm/wayland platforms. Lets these build/run
on OpenBSD.
v2: preserve the libudev requirement for Linux as suggested
by Emil Velikov.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
After the loader changes libudev is no longer required to
build gbm or the egl drm/wayland platforms.
Remove a libudev ifdef which allows the the drm egl driver
to be loaded on OpenBSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
OpenBSD does not have DT_NEEDED entries for libc by design,
over concerns how the symbols would be referenced after
changing the major version of the library.
So avoid -no-undefined checks on OpenBSD as they will fail.
v2: don't include the -no-undefined libtool option in the variable
and change -Wl,--no-undefined references in Automake.inc as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76856
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The targets do not require expat or selinux. Use GALLIUM_COMMON_LIB_DEPS
which provides the core requirements for each gallium target.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system will use g++ to link the static library due to the
dummy.cpp source(s). Thus one does not need the explicit link against
stdc++.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The build system does not know that the static library is C++.
Mention the cpp file to trigger generation of the proper variable
and drop the hacky stdc++ linking.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
With recent commit we started de-duplicating all of the compiler/
linker flags moving their handling inside Automake.inc.
This did not take into consideration that the above variable was set
at configure time, leading to issues on certain build combinations.
Move the variable to where it's used/handled thus cleaning up
configure.ac.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76848
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The pkg-config module was called "EXPAT" instead of "expat" in
PKG_CHECK_EXISTS. This seems to have been wrong because the wrong
argument was copied from PKG_CHECK_MODULES.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
_GNU_SOURCE is only set/required for linux*|*-gnu*|gnu*) and as the
functionality is available on other systems check for RTLD_DEFAULT instead.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Platforms that lack libudev (OpenBSD and possibly others) need
this change in order to load the correct dri driver.
Under linux we unconditionally require libudev, thus this code
will never get build.
v2: Add commit message (Emil Velikov)
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 96e8b916a7.
In the case of VCE encoding with raw YUV file, CPU load directly
to VRAM is faster than combination of CPU writing to GTT and
then blit to VRAM with GPU.
Reviewed-by: Christian König <christian.koenig@amd.com>
Keep a dynamically increasing array of all the views
created for a texture instead of just the last one.
v2: add comments, fix array size calculation,
release only the first sampler view found
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The xa version number had to be set in two places. In configure.ac and in
xa_tracker.h. Furthermore, xa_tracker.h is an installed header so we can't
use mesa internal defines. So therefore, at configure time, modify the
xa_tracker.h header to use the version given by configure.ac
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
/me puts a paper bag on his head and sits in the corner.
This was supposed to be included in 5a68f731, which added
glPointSizePointerOES back to the list of functions exposed by
libGLESv1_CM. It looks like it was an uncommitted change in my tree
when I sent the patch out.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are checking for no-ops in the CSO module for both of these items
so there's no reason to do it in the driver.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
As we do for sampler states in single_sampler_done() and many other
CSO functions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously GLX_EXT_buffer_age has always been advertised as supported because
both client_glx_support and client_glx_only where set. So it did not matter
that direct_support is only set when running dri3 and we ended up always
advertising it.
Fix that by not setting client_glx_only for buffer_age in known_glx_extensions.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to call pipe->set_sampler_views() with count being the
maximum of the old number of sampler views and the new number.
This makes sure we null-out any old sampler views.
We already do the same thing for sampler states in single_sampler_done().
Fixes some assertions seen in the VMware driver with XA tracker.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Tested-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ages ago Chia-I added an ES compatibility flag to several of the various
generator scripts. The intention was to bridge differences between ES
and desktop in Mesa builds without ES. It doesn't appear that it has
ever been used. Recent changes to static_dispatch status of several ES1
functions caused problems in desktop-only, non-shared-glapi builds.
Enabling the ES compatibility mode appears to fix these build problems.
This is kind of a duct tape solution to this problem. As I mentioned in
the cover letter for the series that triggered the build problem, I
would like to make some major changes to the generator architecture and
the XML. The whole point of the proposed architecture changes is to
better handle the differences between desktop GL and ES. I think duct
tape is okay for now.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76869
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Cc: Chia-I Wu <olv@lunarg.com>
Commit fb78fa58 made the GL_ARB_debug_output functions aliases of the
GL_KHR_debug output functions. As a result, the function names in
struct _glapi_table also changed. The table in check_table.cpp used the
ARB names.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
C89 has a fairly short minimum-maximum string length. To support
compilers limited by the C89 limits, this script had a mode where it
would generate a character array instead of a giant string. These were
functionally the same, but the code generated for the character array is
HUGE and difficult to read.
As far as I can tell, nothing in Mesa uses '-m short' any more. The
generated files used to be tracked in revision control, but I think we
stopped using '-m short' when we stopped tracking the generated files.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Brian Paul <brianp@vmware.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Cc: Vinson Lee <vlee@freedesktop.org>
Nested for loops running through tables against which they
finally do an assert were ran also with optimized builds.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
% operator could return negative value which would cause
indexing before perm table. Change %256 to &0xff
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is to avoid running out of query buffer space due to winsys
limitations. Instead of a fixed size per screen pool of query buffers,
use a slab allocator that allocates a new slab if we run out of space
in the first one.
v2: Correct email addresses.
v3: s/8192/VMW_QUERY_POOL_SIZE/. Improve documentation and log message.
Reported-and-tested-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds a gallium cap that allows us to fake GL3.0 by
not exposing MSAA on sw rendering.
It also forces the extra extensions needed for GL3.2.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts commit f6e290f80c.
To fix the broken build. The DRI-enabled build seems OK after reverting.
Th non-DRI/gallium build is still suffering from an unrelated issue in
the pipe-loader code.
Currently, we raise an error when doing this which breaks a conformance
test from the OpenGL samples pack. Even if this is a bit silly it is not
an error.
From http://www.opengl.org/wiki/Rectangle_Texture:
"Rectangle textures contain exactly one image; they cannot have mipmaps.
Therefore, any texture parameters that depend on LODs are irrelevant
when used with rectangle textures; attempting to set these parameters to
any value other than 0 will result in an error."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76496
Reviewed-by: Brian Paul <brianp@vmware.com>
The previous commit stopped exporting 21 libGLESv2 and 88 libGLESv1_CM
functions. This removes the work-arounds for those functions from
ABI-check.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
This has been a long standing issue with the ES libraries. Functions
marked in the XML with 'static_dispatch=false' were still incorrectly
exported. ABI-check is supposed to detect this case, but we have to
paper over failures every time a new extension is added.
This change will cause a big pile of functions to disappear from
libGLESv2 and libGLESv1_CM.
libGLESv2 loses (20 functions):
glBindVertexArrayOES
glCompressedTexImage3DOES
glCompressedTexSubImage3DOES
glCopyTexSubImage3DOES
glDeleteVertexArraysOES
glDiscardFramebufferEXT
glDrawBuffersNV
glFlushMappedBufferRangeEXT
glFramebufferTexture3DOES
glGenVertexArraysOES
glGetBufferPointervOES
glGetProgramBinaryOES
glIsVertexArrayOES
glMapBufferOES
glMapBufferRangeEXT
glProgramBinaryOES
glReadBufferNV
glTexImage3DOES
glTexSubImage3DOES
glUnmapBufferOES
libGLESv1_CM loses (88 functions):
glAlphaFuncxOES
glBindFramebufferOES
glBindRenderbufferOES
glBlendEquationOES
glBlendEquationSeparateOES
glBlendFuncSeparateOES
glCheckFramebufferStatusOES
glClearColorxOES
glClearDepthfOES
glClearDepthxOES
glClipPlanefOES
glClipPlanexOES
glColor4xOES
glDeleteFramebuffersOES
glDeleteRenderbuffersOES
glDepthRangefOES
glDepthRangexOES
glDiscardFramebufferEXT
glDrawTexfOES
glDrawTexfvOES
glDrawTexiOES
glDrawTexivOES
glDrawTexsOES
glDrawTexsvOES
glDrawTexxOES
glDrawTexxvOES
glFlushMappedBufferRangeEXT
glFogxOES
glFogxvOES
glFramebufferRenderbufferOES
glFramebufferTexture2DOES
glFrustumfOES
glFrustumxOES
glGenerateMipmapOES
glGenFramebuffersOES
glGenRenderbuffersOES
glGetBufferPointervOES
glGetClipPlanefOES
glGetClipPlanexOES
glGetFixedvOES
glGetFramebufferAttachmentParameterivOES
glGetLightxvOES
glGetMaterialxvOES
glGetRenderbufferParameterivOES
glGetTexEnvxvOES
glGetTexGenfvOES
glGetTexGenivOES
glGetTexGenxvOES
glGetTexParameterxvOES
glIsFramebufferOES
glIsRenderbufferOES
glLightModelxOES
glLightModelxvOES
glLightxOES
glLightxvOES
glLineWidthxOES
glLoadMatrixxOES
glMapBufferOES
glMapBufferRangeEXT
glMaterialxOES
glMaterialxvOES
glMultiTexCoord4xOES
glMultMatrixxOES
glNormal3xOES
glOrthofOES
glOrthoxOES
glPointParameterxOES
glPointParameterxvOES
glPointSizePointerOES
glPointSizexOES
glPolygonOffsetxOES
glQueryMatrixxOES
glRenderbufferStorageOES
glRotatexOES
glSampleCoveragexOES
glScalexOES
glTexEnvxOES
glTexEnvxvOES
glTexGenfOES
glTexGenfvOES
glTexGeniOES
glTexGenivOES
glTexGenxOES
glTexGenxvOES
glTexParameterxOES
glTexParameterxvOES
glTranslatexOES
glUnmapBufferOES
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Cc: Paul Berry <stereotype441@gmail.com>
This is hella ugly. The same-named function in desktop OpenGL is
hidden, but it needs to be exposed by libGLESv2 for OpenGL ES 3.0.
There's no way to express in the XML that a function should be be hidden
in one API but exposed in another.
This won't affect any change now, but it will prevent a regression in a
later patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Functions that are part of OpenGL ES 1.0 or 1.1 should have static
dispatch functions in libGLESv1_CM. This doesn't affect any change yet,
but it will prevent later regressions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
It looks like these were added accidentally by Paul in commit 1a1db174.
From the commit message and the look of the patch, I think this was just
some sed-job left overs.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It was my understanding that the writemask works in SIMD4x2 mode for
texturing instructions and doesn't require a message header. Some bit of
this logic must be wrong, so disable it until it's understood.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76617
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
By doing GC the linker removes all the symbols that are not referenced
and/or used by the final library. This results in a saving of ~100K
up-to ~600K per (stripped) binary (classic vs gallium drivers).
If interested one can ask the compiler to print the sections that are
removed using -Wl,--print-gc-sections.
v2: Check if ld supports the flag before using it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com> (v1)
... apart from the dri drivers.
With this final change we can build mesa without fear that
the resulting libraries will have unresolved symbols.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Set the flag for all but the dri targets. They have missing
glapi symbols which are required for the normal operation with
the X server.
Jon, I fear that you'll need to carry the "no-undefined" hunk
locally when building the dri drivers under cygwin.
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Explicitly setting the linker variable was required for old and broken
build toolchains. At this point this should no longer be needed, and
setting the sources lists will trigger generation of the correct LINK
variables.
Explicitly include dummy.cpp to use g++ to link the static library which
in most cases is based upon C++ code.
v2: Reword commit message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This lets us have only one if HAVE_MESA_LLVM block, rather than
one for each driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every library uses the same libtool/linker flags. Compact those
into AM_LDFLAGS and append the version script to it.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Or it compiles them in, but pretends they don't exist
v2: Rebase (Emil)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa does _not_ link against libudev. Additionally the only place
that deals with it is the loader, thus we can drop the CFLAGS.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It makes little sense to enable the vdpau, xvmc and omx state-trackers
as they do not make use of (don't work with) the software driver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
grep -q is easier to read and consistent with the rest of configure.ac.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gallium osmesa links against the softpipe driver, thus the build
will fail if it's missing.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Evidently at least static OSMesa is still used as shared one
causes substantial increase in the load time for some programs
that use it (from seconds up-to ~30min).
Rather than forcing everyone to use shared mesa, revert commit
a6efbac9fb and default to shared
build when both shared and static are disabled.
v2: Whitespace cleanup, drop silly comment.
Reported-by: Burlen Loring <burlen.loring@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Note: This was briefly merged before, but exposed some breakage in gallium, so
was reverted. Hopefully it will stick this time.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_format_matches_format_and_type() returns true for
GL_RED/GL_RED_INTEGER (with an appropriate type) into an intensity
mesa_format.
We want the `red`-based format instead, regardless of the order we find
them in our walk of the mesa formats list.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The case for this was in the wrong function, and this format's store
func was not set in the table at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Whether or not the coords are normalized is handled in the texture
state. But we otherwise need to treat RECT sample instructions as 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In some cases, we need a register to be assigned up to three components
before the base. Since we can't have negative register #'s, just shift
everything up. May increase register usage for trivial shaders, but I
don't think we are shader limited in those cases. A proper solution is
going to require a better register assignment algorithm (which is on the
TODO list), this is just a hack to get us by until then.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
RB_FRAME_BUFFER_DIMENSION is not a banked context register, so we need
to wait for the GPU to idle before updating it. But we'd rather not
have unnecessary WFI's, so actually keep track if we need to emit it or
not.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This is something that XA triggers. In some cases it will only use
SAMP[1] (composite mask) but not SAMP[0] (composite src).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Based on a patch by Ville Syrjälä.
As usual, these are placeholder values; actual values will come later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the closing for the "\defgroup IR Intermediate representation
nodes" all the way at the top of the file.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When doing software rendering (i.e. rendering to the selection buffer) we need
to make sure that we have valid index bounds before calling _tnl_draw_prims(),
otherwise we can crash.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=59455
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The underlying glDrawArrays() calls weren't getting compiled into
the display list. We simply need to use the current dispatch table
so the CALL_DrawArrays() is routed to the display list save function.
This patch also fixes glMultiModeDrawArraysIBM and
glMultiModeDrawElementsIBM.
Fixes the new piglit gl-1.4-dlist-multidrawarrays test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we only examined the GL_DEPTH_MODE state to determine the
sampler view swizzle for depth textures. Now we also consider the
texture base format for color textures too.
The basic idea is if we're sampling from a RGB texture we always
want to get A=1, even if the actual hardware format might be RGBA.
We had assumed that the texture's A values were always one since that's
what Mesa's texstore code does. But if we render to the RGBA texture,
the A values might not be 1. Subsequent sampling didn't return the
right values.
Now we examine the user-specified texture base format vs. the actual
gallium format to determine the right swizzle.
Fixes several fbo-blending-formats, fbo-clear-formats and fbo-tex-rgbx
failures with VMware/svga driver (and possibly other drivers).
No other piglit regressions with softpipe or VMware/svga.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
In preparation for following changes.
I used a temporary test harness to compare the old code to the new
for all possible swizzle inputs. No change in results.
This also happens to fix a leak of the current GS pull constant BO on
context destroy, by just not holding on to the pull const bos after the
surface state is generated.
No statistically significant performance difference on GLB2.7 on HSW at
1024x768 (n=40) or 320x240 (n=44), or on BYT at 320x240 (n=47).
v2: Rebase on intel_upload simplification.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The implementation kept a page-sized area for uploading data, and
uploaded chunks from that to a 64kb-sized streamed buffer. This wasted
cache footprint (and extra state tracking to do so) when we want to just
write our data into the buffer immediately.
Instead, build it around an interface like brw_state_batch() that just
gets you a pointer to BO memory to upload your stuff immediately.
Improves OpenArena on HSW by 1.62209% +/- 0.355299% (n=61) and on BYT by
1.7916% +/- 0.415743% (n=31).
v2: Rebase on Mesa master, drop old prototypes. Re-do performance
comparison on a kernel that doesn't punish CPU efficiency
improvements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
it's useful to know what the llvmbuildstore arguments are going to
be before executing it because it can crash and make sure to
print out the inputs only if we're not generating a gs because
it fetches inputs differently.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We used to overallocate the output buffer sometimes running out
of memory with applications rendering large geometries. The actual
maximum number of vertices out is simply the maximum number of
primitives in (number of gs invocations) multiplied by the maximum
number of output vertices per gs input primitive (i.e. gs invocation).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Don't pass null query object pointers into gallium functions.
This avoids segfaulting in the VMware driver (and others?) if the
pipe_context::create_query() call fails and returns NULL.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
sed commands:
s/ARGB2101010_UINT\b/B10G10R10A2_UINT/g
s/ABGR2101010_UINT\b/R10G10B10A2_UINT/g
Reviewed-by: Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It is quite hard to meet the dependency of the libxml2 python bindings
outside Linux, and in particularly on MacOSX; whereas ElementTree is
part of Python's standard library. ElementTree is more limited than
libxml2: no DTD verification, defaults from DTD, or XInclude support,
but none of these limitations is serious enough to justify using
libxml2.
In fact, it was easier to refactor the code to use ElementTree than to
try to get libxml2 python bindings.
In the process, gl_item_factory class was refactored so that there is
one method for each kind of object to be created, as it simplifies
things substantially.
I confirmed that precisely the same output is generated for GL/GLX/GLES.
v2: Remove m4/ax_python_module.m4 as suggested by Matt Turner.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Release the references to the sampler views before
destroying the pipe context.
v2: remove TODO and unrelated change
v3: move to st_texture.[ch], rename callback, add comment
v4: fix rebase mess up and add further cleanups
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
This can get called in some circumstances if both src type and dst type
have same width (seen with float32->unorm32). While this particular case
was bogus anyway let's just fix that as it can work trivially (due to the
way it was called it actually worked anyway apart from the assert).
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When deciding if a clear color is suitable for fast clear,
take into account if a color channel is active in the
buffer format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch introduces two pre-canned MOCS values: BDW_MOCS_WB
(write-back, all caches) and BDW_MOCS_WT (write-through, all caches).
We use write-through caching for render targets, and write-back for
all other data. (At least on Haswell, I believe write-back LLC/eLLC
didn't work for scan-out buffers, while write-through did.)
No performance analysis has been done on the impact of this patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
All of the functionality is implemented in a private function in the one
file where it is used.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fix eglCreateImage() from a packed dma_buf surface with a non-zero offset
to pixels data. In particular, this fixes support for planar YUV surfaces
when they are individually mapped on a per-plane basis, i.e. when the
OES_EGL_image_external is not used and user application wants to use its
own shader code for composition, or processing on individual plane (OCL).
Signed-off-by: Gwenole Beauchesne <gwenole.beauchesne@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Implementation note:
I don't use context for ralloc (don't know how).
The check on PROGRAM_SEPARABLE flags is also done when the pipeline
isn't bound. It doesn't make any sense in a DSA style API.
Maybe we could replace _mesa_validate_program by
_mesa_validate_program_pipeline. For example we could recreate a dummy
pipeline object. However the new function checks also the
TEXTURE_IMAGE_UNIT number not sure of the impact.
V2:
Fix memory leak with ralloc_strdup
Formatting improvement
V3 (idr):
* Actually fix the leak of the InfoLog. :)
* Directly generate logs in to gl_pipeline_object::InfoLog via
ralloc_asprintf isntead of using a temporary buffer.
* Split out from previous uber patch.
* Change spec references to include section numbers, etc.
* Fix a bug in checking that a different program isn't active in a stage
between two stages that have the same program. Specifically,
if (pipe->CurrentVertexProgram->Name == pipe->CurrentGeometryProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
should have been
if (pipe->CurrentVertexProgram->Name == pipe->CurrentFragmentProgram->Name &&
pipe->CurrentGeometryProgram->Name != pipe->CurrentVertexProgram->Name)
v4 (idr): Rework to use CurrentProgram array in loops.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is much like _mesa_sampler_uniforms_are_valid, but it operates
across an entire pipeline object.
This function differs from _mesa_sampler_uniforms_are_valid in that it
directly creates the gl_pipeline_object::InfoLog instead of writing to
some temporary buffer.
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Fix the loop bounds. shProg isn't an array, so
ARRAY_SIZE(shProg) was 1, so only the vertex program was validated.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2 (idr):
* Keep the behavior of other info logs in Mesa: and empty info log
reports a GL_INFO_LOG_LENGTH of zero.
* Use a NULL pointer to denote an empty info log.
* Split out from previous uber patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Test become green in piglit:
The updated ext_transform_feedback-api-errors:useprogstage_noactive useprogstage_active bind_pipeline
arb_separate_shader_object-GetProgramPipelineiv
arb_separate_shader_object-IsProgramPipeline
For the moment I reuse Driver.UseProgram but I guess it will be better
to create a UseProgramStages functions. Opinion is welcome
V2: formatting & rename
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in the
extension spec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now arb_separate_shader_object-GetProgramPipelineiv should pass.
V3 (idr):
* Change spec references to core OpenGL versions instead of issues in
the extension spec.
* Split out from previous uber patch.
v4 (idr): Use _mesa_has_geometry_shaders in _mesa_UseProgramStages to
detect availability of geometry shaders.
v5 (idr): Whitespace cleanup, use _mesa_lookup_shader_program_err
instead of open-coding it again, and update some comments at the end of
_mesa_UseProgramStages. All suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Extend use_shader_program to support a different target. Allow to reuse the
function to update the pipeline state. Note I bypass the flush when target
isn't current. Maybe it would be better to create a new UseProgramStages
driver function
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
save and restore _Shader/Pipeline binding point. Rational we don't want any
conflict when the program will be unattached.
V2: formatting improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Basically a sed but shaderapi.c and get.c.
get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior
shaderapi.c => the old api stil update the Shader object directly
V2: formatting improvement
V3 (idr):
* Rebase fixes after a block of code was moved from ir_to_mesa.cpp to
shaderapi.c.
* Trivial reformatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To avoid NULL pointer check a default pipeline object is installed in
_Shader when no program is current
The spec say that UseProgram/UseShaderProgramEXT/ActiveProgramEXT got an
higher priority over the pipeline object. When default program is
uninstall, the pipeline is used if any was bound.
Note: A careful rename need to be done now...
V2: formating improvement
V3 (idr):
* Build fix. The original patch added calls to _mesa_use_shader_program
with 4 parameters, but the fourth parameter isn't added to that
function until a much later patch. Just drop that parameter for now.
* Trivial reformatting.
* Updated comment of gl_context::_Shader
v4 (idr): Reformat spec quotations to look like spec quotations. Update
comments describing what gl_context::_Shader can point to. Bot
suggested by Eric.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eliminate lp_vertex_shader, as it added nothing over draw_vertex_shader.
Simplify lp_geometry_shader, as most of the incoming state is unneeded.
(We could also just use draw_geometry_shader if we were willing to peek
inside the structure.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
* Add HAVE_PTHREAD, we do have pthread support wrappers now for
non-native Haiku threaded applications.
* Viewport changed behavior recently breaking the build.
We fix this by looking at the gl_context ViewportArray
(Thanks Brian for the idea)
Acked-by: Brian Paul <brianp@vmware.com>
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 50%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
v2: Use _mesa_format_has_color_component() to properly handle ALPHA
formats (and generally be less fragile).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
WebGL Aquarium in Chrome 24 actually hits this.
v2: Move to core Mesa (wisely suggested by Ian); only consider
components which actually exist.
v3: Use _mesa_format_has_color_component to determine whether components
actually exist, fixing alpha format handling.
v4: Add a comment, as requested by Brian. No actual code changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
When considering color write masks, we often want to know whether an
RGBA component actually contains any meaningful data. This function
provides an easy way to answer that question, and handles luminance,
intensity, and alpha formats correctly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
This gets us equivalent code paths on BDW and pre-BDW, except for stencil
(where we don't have MSAA stencil resolve code yet)
Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces
DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9).
v2: Move the new meta code to brw_meta_updownsample.c, name it
brw_meta_updownsample(), add a comment about
intel_rb_storage_first_mt_slice(), and rename that function and move
the RB generation into it (review ideas by Ken).
v3: Fix 2 src vs dst pasteos in previous change.
v4: Skip this path pre-gen8 for now, until we can analyze the glxgears
performance delta some more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that BindRenderbufferTexImage() is a thing that drivers can do, winsys
FBOs *can* have NeedsFinishRenderTexture set.
v2: Keep the short-circuit for non-BindRenderbufferTexImage() drivers
(review by Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even if the singlesample_mt got reopened from DRI due to
pageflipping/buffer swapping, our private miptree shouldn't need any
changes.
Improves performance of a little swapbuffers-loving microbenchmark with
MSAA forced on, by 1.2371% +/- 0.624802% (n=102)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The formatting was weird, and the tests were duplicated, and it is
guaranteed that mt->region exists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes a memory leak with MSAA winsys buffers since my move of
singlesample_mt to the rb in 4e0924c5de
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sometimes it would be nice to benchmark some app with MSAA versus not, but
it doesn't offer the controls you want. Just provide a handy knob to
force the issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For each write, search previous instructions for unread writes to the
flag register and remove them. Note that this will not eliminate the
last unread write.
total instructions in shared programs: 788074 -> 788004 (-0.01%)
instructions in affected programs: 4930 -> 4860 (-1.42%)
Reviewed-by: Eric Anholt <eric@anholt.net>
With an awful O(n^2) algorithm that searches previous instructions for
dead writes.
total instructions in shared programs: 805582 -> 788074 (-2.17%)
instructions in affected programs: 144561 -> 127053 (-12.11%)
Reviewed-by: Eric Anholt <eric@anholt.net>
That is, modify
mad dst, a, b, c
to be
mad dst.xyz, a, b, c
if dst.w is never read.
total instructions in shared programs: 811869 -> 805582 (-0.77%)
instructions in affected programs: 168287 -> 162000 (-3.74%)
Reviewed-by: Eric Anholt <eric@anholt.net>
A future patch adds support for removing dead writes to the flag
register. This patch simplifies the logic until then.
total instructions in shared programs: 811813 -> 811869 (0.01%)
instructions in affected programs: 3378 -> 3434 (1.66%)
Reviewed-by: Eric Anholt <eric@anholt.net>
To be consistent with the fs backend. Also the instruction scheduler
incorrectly considered SEL with a conditional modifier to read the flag
register.
Reviewed-by: Eric Anholt <eric@anholt.net>
The ARB_framebuffer_object spec lists this case before the
FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER and
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases.
Fixes two broken cases in piglit's fbo-incomplete test, if
ARB_ES2_compatibility is not advertised. (If it is, this is masked
because the FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER /
FRAMEBUFFER_INCOMPLETE_READ_BUFFER cases are removed by that extension)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
GL_INTENSITY has never been valid as a pixel format -- to get the memcpy
pack/unpack paths, the app needs to specify GL_RED as the pixel format
(or GL_RED_INTEGER for the integer formats).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
With shared glx contexts it is possible that a texture is create and used
in one context and then used in another one resulting in incorrect
sampler view usage.
v2: avoid template copy
v3: add XXX comment
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 2919c3fdb4.
For formats like BGRX, looping through 0..num_components works fine.
But for formats like XRGB, we'd check the color mask for X and fail to
check it for B.
The SIMD16 replicated FB write message only works if we don't need the
color calculator to mask our framebuffer writes. Previously, we bailed
on it if color_mask wasn't <true, true, true, true>. However, this was
needlessly strict for formats with fewer than four components - only the
components that actually exist matter.
WebGL Aquarium attempts to clear a BGRX texture with the ColorMask set
to <true, true, true, false>. This will work perfectly fine with the
replicated data message; we just bailed unnecessarily.
Improves performance of WebGL Aquarium on Iris Pro (at 1920x1080) by
abound 40%, and Bay Trail (at 1366x768) by over 70% (using Chrome 24).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>
Currently, we don't use this path on Sandybridge because we suspect
other paths will be faster. But we potentially could. If we do, we
should allow it to support Y-tiled BLTs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested on ILK and CTG (with the GL3isms taken out of the piglits).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The conversion code for srgb was tuned for n x 4x8bit AoS -> 4 x nxfloat SoA
(and vice versa), fix this to handle also 16bit 565-style srgb formats.
Still not really all that generic, things like r10g10b10a2_srgb or
r4g4b4a4_srgb wouldn't work (the latter trivial to fix, the former would not
require more work to not crash but near certainly need some higher precision
calculation) but not needed right now.
The code is not fully optimized for this (could use more direct calculation
instead of expanding to 8-bit range first) but should be good enough.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
GL generally doesn't seem to allow srgb formats with less (or more) than 8 bit
for the rgb channels, though some hw could easily do it (typically for formats
with up to 10 bits for the rgb channels, at least for formats with less than 8
bits support is likely widespread even). While it may be true there aren't
really any benefits for such formats, we need for it for d3d, though luckily
only for b5g6r5_srgb it seems.
So add this format along with the util code for conversion - since that util
code is heavily tuned for 8bit srgb this isn't really all that well optimized
and rounding doesn't seem right but at least it should give some halfway
meaningful results.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The nvc0 texfetch instruction expects the sample id to be in the second
source (usually used for the offset) rather than as part of the texture
coordinate.
This fixes all the sampler2DMS/Array tests on nvc0.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This fallback to triangle strips is silly and should be done in drivers
if they need it.
This should fix the case when quad strips are used with flatshading that is
enabled by the "flat" GLSL varying modifier. It also fixes primitive restart
for quad strips.
This fixes piglit:
NV_primitive_restart/primitive-restart-draw-mode-quad_strip
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
The last changes to it are from 2008 and 2009.
It doesn't support most texture formats and some texture targets.
Nobody can possibly be using this.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is a slightly modified version of evergreen_dma_blit (and
evergreen_dma_copy as well as evergreen_dma_copy_tile).
It would be nice to share some of the code in the long term.
I have reused some "cik"-prefixed functions that also return the right
value for SI. I am not sure if they should be renamed.
v2: Marek> removed gfx.flush in si_dma_copy_tile
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The AoS version of ld_build_blend_factor was assuming that if the first
channel was alpha, there were no rgb components.
Fixes glean/blendFunc on System z. No piglit regressions on x86_64.
The shortcut is still used in tests like spec/ARB_framebuffer_object/
fbo-alpha.
Signed-off-by: Richard Sandiford <rsandifo@linux.vnet.ibm.com>
libdl does not exist on many platforms which have dlopen in libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In the gallium code, the assert() macro could come from either the
system's assert.h file (via c11/threads.h) or from gallium's u_debug.h.
It looks like all known assert.h files unconditionally #undef assert
before defining their own version. So the assert you get depends on
whether threads.h or u_debug.h was included last.
In the gallium code we really want to use the assert() from u_debug.h
(it behaves better on Windows). In gallium, c11/threads.h is only
included after u_debug.h in the os_thread.h wrapper. So Adding
an #ifndef assert test in the threads*.h files avoids using the system's
assert().
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
EXT_packed_depth_stencil is supported by all drivers, but
ARB_depth_texture isn't (notably nouveau_vieux). This should avoid
passing unexpected values down to ChooseTextureFormat.
The EXT_packed_depth_stencil spec does not make any explicit references
to requiring ARB_depth_texture in order to allow textures with that
format, however if there is no dependency, ARB_depth_texture would be
practically implied by the extension.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Note for 10.0 backport: This will produce a conflict, the solution is to
move the surrounding if as well.
In all uses of dotlike() we're writing generic code that operates on 1-4
component vectors. That our IR requires ir_binop_dot expressions'
operands to be 2+ component vectors is an implementation detail that's
not important when implementing built-in functions with dot(), which is
defined for scalar floats in GLSL.
Reviewed-by: Eric Anholt <eric@anholt.net>
ARB_gpu_shader5 and ES 3.0 expose different subsets of
ARB_shading_language_packing.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Volume 4, part 1 of the Ivybridge PRM says, "Generally, the EWA
approximation algorithm results in higher image quality than the legacy
algorithm." Using a classic anisotropic filtering "tunnel" demo, it
appears that there is *no* anisotropic filtering on IVB without this bit
set.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I meant to include this fixes in v3 of commit
de7ad2c88f, but accidentally pushed a
previous version.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Sadly, we can't use actual ALIGN(), since that only supports
power-of-two values for the alignment parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Register sets depend on the particular hardware generation, but don't
depend on anything in the actual OpenGL context. Computing them is
fairly expensive, and they take up a large amount of memory. Putting
them in the screen allows us to compute/allocate them once for all
contexts, saving both time and space.
Improves the performance of a context creation/destruction
microbenchmark by about 3x on my Haswell i7-4750HQ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow us to use the screen as a memory context.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This one saves about 2MB peak allocation in glsl-fs-algebraic-add-add-1,
with no performance difference on timing short shader-db runs (n=9/10,
warmup outlier removed).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This isn't the GL API, so there's no reason to use GLboolean.
Using bool is safer: any non-zero value is treated as "true". When
converting a value to a GLboolean, all but the low byte is discarded,
which means that values like 256 will be incorrectly rendered as false.
Done via the following vim commands:
:%s/GLboolean/bool/g
:%s/GL_TRUE/true/g
:%s/GL_FALSE/false/g
and one line of manual whitespace tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ideally, we'd like to never even attempt the SIMD16 compile if we could
know ahead of time that it won't succeed---it's purely a waste of time.
This is especially important for state-based recompiles, which happen at
draw time.
The fragment shader compiler has a number of checks like:
if (dispatch_width == 16)
fail("...some reason...");
This patch introduces a new no16() function which replaces the above
pattern. In the SIMD8 compile, it sets a "SIMD16 will never work" flag.
Then, brw_wm_fs_emit can check that flag, skip the SIMD16 compile, and
issue a helpful performance warning if INTEL_DEBUG=perf is set. (In
SIMD16 mode, no16() calls fail(), for safety's sake.)
The great part is that this is not a heuristic---if the flag is set, we
know with 100% certainty that the SIMD16 compile would fail. (It might
fail anyway if we run out of registers, but it's always worth trying.)
v2: Fix missing va_end in early-return case (caught by Ilia Mirkin).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just a matter of reusing the pull/push constant information set
up by the SIMD8 compile.
This gains us 78 SIMD16 programs in shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we don't renumber uniform registers, assign_constant_locations
and move_uniform_array_access_to_pull_constants use the same names.
So, they can share a single copy of the pull_constant_loc[] array.
This simplifies the code considerably. assign_constant_locations()
doesn't need to walk through pull_params[] to rediscover reladdr
demotions; it just has that information in pull_constant_loc[]. We also
only need to rewrite the instruction stream once, instead of twice.
Even better, we now have a single array describing the layout of
all pull parameters, which we can pass to the SIMD16 program.
This actually hurts a few shaders in Serious Sam 3, and one in KWin:
total instructions in shared programs: 1841957 -> 1842035 (0.00%)
instructions in affected programs: 1165 -> 1243 (6.70%)
Comparing dump_instructions() before and after the pull constant
transformations with and without this patch, it appears that there is
a uniform array with variable indexing (reladdr) and constant indexing
(of array element 0). Previously, we uploaded array element 0 as both
a pull constant (for reladdr) /and/ a push constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, remove_dead_constants() would renumber the UNIFORM registers
to be sequential starting from zero, and the resulting register number
would be used directly as an index into the params[] array.
This renumbering made it difficult to collect and save information about
pull constant locations, since setup_pull_constants() and
move_uniform_array_access_to_pull_constants() used different names.
This patch generalizes setup_pull_constants() to decide whether each
uniform register should be a pull constant, push constant, or neither
(because it's unused). Then, it stores mappings from UNIFORM register
numbers to params[] or pull_params[] indices in the push_constant_loc
and pull_constant_loc arrays. (We already did this for pull constants.)
Then, assign_curb_setup() just needs to consult the push_constant_loc
array to get the real index into the params[] array.
This effectively folds all the remove_dead_constants() functionality
into assign_constant_locations(), while being less irritable to work
with.
v2: Add assert(remapped <= i), requested by Topi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.
We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.
Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.
This situation will improve in the next few patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.
brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To emphasize that the result is floating point 1.0 or 0.0, to match
other opcodes like SLE and SEQ.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Handle mapping edgeflag data similar to the code around it.
This fixes a crash in piglit test gl-2.0-edgeflag.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Enable EGL_EXT_platform_base and the Linux platform extensions layered
atop it: EGL_EXT_platform_x11, EGL_EXT_platform_wayland,
and EGL_MESA_platform_gbm.
Tested with Piglit's EGL_EXT_platform_base tests under an X11 session.
To enable running the Wayland and GBM tests, windowed Weston was running
and the kernel had render nodes enabled.
I regression tested my EGL_EXT_platform_base patch set with Piglit on
Ivybridge under X11/EGL, standalone Weston, and GBM with rendernodes. No
regressions found.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_EXT_wayland_spec, version 3:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to Wayland. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
From the EGL_MESA_platform_gbm spec, version 5:
It is not valid to call eglCreatePlatformPixmapSurfaceEXT with a <dpy>
that belongs to the GBM platform. Any such call fails and generates
EGL_BAD_PARAMETER.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Internally, much of the EGL code uses EGLNativeDisplayType,
EGLNativeWindowType, and EGLPixmapType. However, the EGLNative type
often does not match the variable's actual type.
The concept of EGLNative types are a bad match for Linux, as explained
below. And the EGL platform extensions don't use EGLNative types at all.
Those extensions attempt to solve cross-platform issues by moving the
EGL API away from the EGLNative types.
The core of the problem is that eglplatform.h can define each EGLNative
type once only, but Linux supports multiple EGL platforms.
To work around the problem, Mesa's eglplatform.h contains multiple
definitions of each EGLNative type, selected by feature macros. Mesa
expects EGL clients to set the feature macro approrpiately. But the
feature macros don't work when a single codebase must be built with
support for multiple EGL platforms, *such as Mesa itself*.
When building libEGL, autotools chooses the EGLNative typedefs based on
the first element of '--with-egl-platforms'. For example,
'--with-egl-platforms=x11,drm,wayland' defines the following:
typedef Display* EGLNativeDisplayType;
typedef Window EGLNativeWindowType;
typedef Pixmap EGLNativePixmapType;
Clearly, this doesn't work well for Wayland and GBM. Mesa works around
the problem by casting the EGLNative types to different things in
different files.
For sanity's sake, and to prepare for the EGL platform extensions, this
patch removes from egl/main and egl/dri2 all internal use of the
EGLNative types. It replaces them with 'void*' and checks each explicit
cast with a static assertion. Also, the patch touches egl_gallium the
minimal amount to keep it compatible with eglapi.h.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_image, set it for each platform, and
let egl_dri2 dispatch eglCreateImageKHR to that.
To remove ambiguity, rename egl_dri2.c:dri2_create_image() to
dri2_create_image_from_dri().
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_initialize_x11_swrast() does a strange thing. For some extensions
it doesn't support, it sets the corresponding functions in
_EGLDriver::API to NULL. The intention here is clear, but misplaced.
NULL or not, the function pointers never get called because their
extensions aren't supported.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_wayland_buffer_from_image, set it for
each platform, and let egl_dri2 dispatch
eglCreateWaylandBufferFromImageWL to that.
This prepares for the EGL platform extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
egl_dri2.c:dri2_terminate() handled terminating X11 and DRM displays.
The Wayland platform implemented its own dri2_wl_terminate(), which was
nearly a copy of the common one.
To implement the EGL platform extensions, we either need to dispatch
eglTerminate per display or define a common implementation for all
platforms. This patch chooses consolidation. It removes
dri2_wl_terminate() by folding it into the common dri2_terminate().
It was necessary to invert the `if (disp->PlatformDisplay == NULL)` and
the switch statement because, unlike DRM and X11, Wayland's terminator
performed action even when EGL didn't own the native display. In the
inversion, I replaced `disp->PlatformDisplay == NULL` with
`dri2_dpy->own_device` because the two expressions are synonymous, but
the latter's meaning is clearer.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When the user calls eglGetDisplay(EGL_DEFAULT_DISPLAY), the Wayland and
DRM platforms set dri2_dpy->own_device=true. This patch makes the X11
platform do the same for consistency.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::post_sub_buffer, set it for each
platform, and let egl_dri2 dispatch eglPostSubBufferNV to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_region, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersRegionNOK to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::copy_buffers, set it for each
platform, and let egl_dri2 dispatch eglCopyBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::query_buffer_age, set it for each
platform, and let egl_dri2 dispatch API.QueryBufferAge to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::destroy_surface, set it for each
platform, and let egl_dri2 dispatch eglDestroySurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePbufferSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_pbuffer_surface, set it for each
platform, and let egl_dri2 dispatch eglCreatePixmapSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::create_window_surface, set it for each
platform, and let egl_dri2 dispatch eglCreateWindowSurface to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers_with_damage, set it for each
platform, and let egl_dri2 dispatch eglSwapBuffersWithDamageEXT to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_buffers, set it for each platform, and
let egl_dri2 dispatch eglSwapBuffers to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add dri2_egl_display_vtbl::swap_interval, set it for each platform, and
let egl_dri2 dispatch eglSwapInterval to that.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call it through the driver dispatch table. Just call it
statically.
This prepares for the EGL platform extensions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Each of the egl_dri2 platforms (except Android) prefix their function
names with "dri2", not "dri2_${platform}". This means many function
names have three separate definitions in the egl_dri2 directory: one in
each of platform_drm.c, platform_wayland.c, and platform_x11.c. For
example, each of the three files defines dri2_create_window_surface().
The name collisions make it difficult to review patches for correctness
("Is this patch hunk calling a platform_x11 function or a global
egl_dri2 function?"), complicate debugging, and confuse code navigation
tools.
For each function in platform_x11.c prefixed with 'dri2', this patch
changes its prefix to 'dri2_x11'. Likewise for platform_drm.c and
'dri2_drm'; and platform_wayland.c and 'dri2_wl'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
dri2_egl_display has only one virtual function, 'authenticate'. Define
dri2_egl_display::vtbl and move 'authenticate' there.
This prepares for the EGL platform extensions, which will add many
more virtual functions to dri2_egl_display.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This pulls in EGL_EXT_platform_base, EGL_EXT_platform_wayland,
EGL_EXT_platform_x11, and EGL_MESA_platform_gbm.
This patch has a lot of churn because Khronos recently changed its
method of generating headers. Khronos now generates it headers from XML.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows retrieving the existing BO and incrementing its reference count,
instead of creating a separate winsys representation for it, when the kernel
reports that the BO was already assigned a virtual memory address.
This fixes problems with XWayland using radeonsi and the
xf86-video-wlglamor driver, which calls GEM flink outside of the radeon
winsys code and creates BOs from the flinked names using the same DRM file
descriptor.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
install-gallium-links.mk fails to create the compat link for ilo_dri.so
because it looks for dri_LTLIBRARIES instead of noinst_LTLIBRARIES. Fix this
by switching to dri_LTLIBRARIES (and make the driver installable).
Since pci_id_driver_map.h and the DDX both tell libGL.so to look for "i965",
ilo_dri.so will never be loaded even enabled and installed. The change should
not create any more confusion.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The GL 4.4 spec says it's not color-renderable and not accepted
by RenderBufferStorage. The EXT_texture_shared_exponent spec says
it's not color-renderable but it's accepted by RenderBufferStorageEXT.
This seems to be a bug in the extension spec.
Let's do what GL 4.4 says.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can just check the polygon mode when updating the edge flag state.
Also, we can just flag ST_NEW_VERTEX_PROGRAM directly, which makes
ST_NEW_EDGEFLAGS_DATA useless.
Normally, nothing uses live intervals at this point, so this isn't
necessary. However, dump_instructions() calculates them and uses them
to show register pressure. So, calling dump_instructions() in this area
of the code would segfault due to the arrays being the wrong size.
This is not a candidate for stable branches because it only serves to
fix internal debugging code that you manually have to invoke by altering
the source code or using gdb.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, dump_instruction() would print output such as:
{ 2} 3: mov vgrf1:F, u0:F
{ 3} 4: mov vgrf7:F, u0:F
{ 4} 5: mov vgrf8:F, u0:F
which looked like either a scalar access or perhaps a constant-indexed
access of element 0, when it was really a variable index.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In commit e57d77280e, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.
To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF. But nothing ensured that inst->src[0].file was GRF.
In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.
{ 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{ 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{ 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{ 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{ 4} 4: mov vgrf10+0.0:F, vgrf6:F
{ 3} 5: mov vgrf10+1.0:F, vgrf27:F
{ 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F
{ 5} 7: mov vgrf32:F, u1:F
{ 5} 8: mov.sat vgrf12:F, u1:F
From shader-db:
total instructions in shared programs: 1841932 -> 1841957 (0.00%)
instructions in affected programs: 5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.
This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others. The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out we can allow COHERENT storage/mappings all the time,
regardless of LLC vs non-LLC. It just means never using temporary
mappings to avoid GPU stalls, and on non-LLC we have to use the GTT intead
of CPU mappings. If we were to use CPU maps on non-LLC (which might be
useful if apps end up using buffer_storage on PBO reads, to avoid WC read
slowness), those would be PERSISTENT but not COHERENT, but doing that
would require us driving the clflushes from userspace somehow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like there's no big difference for write-only workloads, but
using a CPU map means that if they happen to read without having set the
MAP_READ_BIT, they get 100x the performance for those reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While in expected usage patterns nobody will ever hit this path, doubling
our bandwidth used seems like a waste, and it cost us extra code too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On LLC, it should always be better to use a cached mapping than the GTT.
On non-LLC, it seems pretty silly to try to optimize read performance for
the INVALIDATE_RANGE_BIT case. This will make the buffer_storage logic
easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Similar to the other cases, shift some weight/coord calculations to int
space. This should be slightly faster (on x86 sse it should actually safe one
instruction, and generally int instructions are cheaper).
The previous code used coords which were calculated as
(int) (f_coord * tex_size * 256) >> 8.
This is not only unnecessarily complex but can give the wrong texel due to
rounding for negative coords (as an example, after denormalization coords
from -1.0 to 0.0 should give -1, but this will give -1 for numbers from
-1.0-1/256 - 0.0-1/256.
Instead, juse use ifloor, dropping the shift stuff.
Unfortunately, this will most likely be slower - with arch rounding available
it shouldn't be too bad (trades a int shift for a round but also saves an int
mul (which is shared by all coords) but otherwise it's a mess.
The previous method for converting coords to ints was sligthly inaccurate
(effectively losing 1bit from the 8bit lerp weight). This is probably
especially noticeable when trying to draw a pixel-aligned texture.
As an example, for a 100x100 texture after dernormalization the texture
coords in this case would turn up as
0.5, 1.5, 2.5, 3.5, 4.5, ...
After the mul by 256, conversion to int and 128 subtraction, they end up as
0, 256, 512, 768, 1024, ...
which gets us the correct coords/weights of
0/0, 1/0, 2/0, 3/0, 4/0, ...
But even LSB errors (which are unavoidable) in the input coords may cause
these coords/weights to be wrong, e.g. for a coord of 3.49999 we'd get a
coord/weight of 2/255 instead.
Fix this by using round-to-nearest int instead of FPToSi (trunc). Should be
equally fast on x86 sse though other archs probably suffer a little.
Constify the offsets parameter to silence gcc warning 'assignment
from incompatible pointer type' due to function prototype miss-match.
Use a boolean changed as a shorthand for target != current_target.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is little point of continuing if fread returns zero, as it
indicates that either the file is empty or cannot be read from.
Bail out if fread returns zero after closing the file.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Core drm defines that the handle is of type int, while all drivers
treat it as uint internally. Typecast the value to silence gcc
warning messages and be consistent amongst all drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 3805a864b1d(nv50: assert before trying to out-of-bounds access
samplers) introduced a series of asserts as a precausion of a previous
illegal memory access.
Although it failed to encapsulate loop within nv50_sampler_state_delete
effectively failing to clear the sampler state, apart from exadurating
the illegal memory access issue.
Fixes gcc warning "array subscript is above array bounds" and
"Nesting level does not match indentation" and "Out-of-bounds read"
defects reported by Coverity.
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: Indent according to Mesa style, reuse sbc instead of making a new
swap_count field, and actually get a usable back before returning the
age of the back (fixing updated piglit tests). Changes by anholt.
Signed-off-by: Adel Gadllah <adel.gadllah@gmail.com>
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Adel Gadllah <adel.gadllah@gmail.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
With the buffer_age code, I need to be able to potentially call this more
than once per frame, and it would be bad if a new special event showing up
meant I chose a different back mid-frame. Now, once we've chosen a back
for the frame, another find_back will choose it again since we know that
it won't have ->busy set until swap.
Note that this makes find_back return a buffer id instead of a backbuffer
index. That's kind of a silly distinction anyway, since it's an identity
mapping between the two (it's the front buffer that is at an offset).
Reviewed-By: Adel Gadllah <adel.gadllah@gmail.com>
This extension provides a way for an application to render to multiple
surfaces with different buffer formats without having to use multiple
contexts. An EGLContext can be created without an EGLConfig by passing
EGL_NO_CONFIG_MESA. In that case there are no restrictions on the surfaces
that can be used with the context apart from that they must be using the same
EGLDisplay.
_mesa_initialze_context can now take a NULL gl_config which will mark the
context as ‘configless’. It will memset the visual to zero in that case.
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig. Mesa needs to be aware that the
context is configless because it affects the initial value to use for
glDrawBuffer. The first time the context is bound it will set the initial
value for configless contexts depending on whether the framebuffer used is
double-buffered.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In eglCreateContext there is a check for whether the config parameter is zero
and in this case it will avoid reporting an error if the
EGL_KHR_surfacless_context extension is supported. However there is nothing in
that extension which says you can create a context without a config and Mesa
breaks if you try this so it is probably better to leave it reporting an
error.
The original check was added in b90a3e7d8b based on the API-specific
extensions EGL_KHR_surfaceless_opengl/gles1/gles2. This was later changed to
refer to EGL_KHR_surfacless_context in b50703aea5. Perhaps the original
extensions specified a configless context but the new one does not.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Under GLES 3 it is not valid to pass GL_FRONT to glDrawBuffers. Instead,
GL_BACK has a magic interpretation which means it will render to the front
buffer on single-buffered contexts and the back buffer on double-buffered. We
were incorrectly setting the initial value to GL_FRONT for single-buffered
contexts. This probably doesn't really matter at the moment except that
presumably it would be exposed in the API via glGetIntegerv.
When we switch to configless contexts this is more important because in that
case we always want to rely on the magic interpretation of GL_BACK in order to
automatically switch between the front and back buffer when a new surface with
a different number of buffers is bound. We also do this for GLES 1 and 2
because the internal value doesn't matter in that case and it is convenient to
use the same code to have the magic interpretation of GL_BACK.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In GLES 3 it is not possible to select rendering to the front buffer and
instead selecting GL_BACK has the magic interpretation that it is either the
front buffer on single-buffered configs or the back buffer on double-buffered.
GLES 1 and 2 have no way of selecting the draw buffer at all. In that case we
were initialising the draw buffer to either GL_FRONT or GL_BACK depending on
the context's config and then leaving it at that.
When we switch to having configless contexts we ideally want Mesa to
automatically switch between the front and back buffer whenever a double-
or single-buffered surface is bound. To make this happen we can just allow
the magic behaviour from GLES 3 in GLES 1 and 2 as well. It shouldn't matter
what the internal value of the draw buffer is in GLES 1 and 2 because there
is no way to query it from the external API.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Commit 6e8d04a caused a leak by allocating ctx->Debug but never freeing it.
Release the memory in _mesa_free_errors_data when destroying a context.
Use FREE to match CALLOC_STRUCT from _mesa_get_debug_state.
Reviewed-by: Brian Paul <brianp@vmware.com>
The few paths that were playing with framebuffers and renderbuffer were
saving and restoring them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was being applied in a subset of the places that
intel_prepare_render() was called, to set the same flag that
intel_prepare_render() was setting.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The flag wasn't getting updated correctly when the ctx->DrawBuffer or
ctx->ReadBuffer changed. It usually ended up working out because most
apps only have one window system framebuffer, or if they have more than
one and they have any front read/drawing, they will have called
glReadBuffer()/glDrawBuffer() on it when they get started on the new
buffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's the ctx->ReadBuffer that gets read from, not the ctx->DrawBuffer.
So, if you happened to have a ctx->ReadBuffer that was the winsys buffer,
and it had previously been intel_prepare_render()ed but not invalidated
since then, and you called glReadBuffer() to switch to front buffer
instead of back buffer reading on the winsys fbo while your drawbuffer was
a user FBO, you'd never get the front buffer's miptree fetched, and
segfault.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I think these are all equivalent to vertex buffer fetches which should be
dword-aligned. Scalar loads are also dword-aligned.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Buffers are disabled by VGT_STRMOUT_BUFFER_CONFIG, but the query only works
if VGT_STRMOUT_CONFIG.STREAMOUT_0_EN is enabled.
This moves VGT_STRMOUT_CONFIG to its own state. The register is set to 1
if either streamout or the primitives-generated query is enabled.
However, the primitives-emitted query is also incremented, so it's disabled
by setting VGT_STRMOUT_BUFFER_SIZE to 0 when there is no buffer bound.
This fixes piglit:
ARB_transform_feedback2/counting with pause
EXT_transform_feedback/primgen-query transform-feedback-disabled
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When doing fast clear for single-sample color buffers for the first time,
a CMASK buffer has to be allocated and the CMASK state in all pipe_surfaces
referencing the color buffer must be updated. Updating all surfaces is kinda
silly, so let's move the values to r600_texture instead.
This is only for Evergreen and later. R600-R700 don't have fast clear.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This looks like r600g. The shared Cayman MSAA code is used here.
The real motivation for this is that I need the ability to change values
of color registers after the framebuffer state is set. The PM4 state cannot
be modified easily after it's generated. With this, I can just change
r600_surface::cb_color_xxx and set framebuffer.atom.dirty=true and it's done.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes the following build error on OpenBSD:
./.libs/libglsl.a(builtin_functions.o)(.text+0x973): In function `mtx_lock':
../../include/c11/threads_posix.h:195: undefined reference to `pthread_mutex_lock'
./.libs/libglsl.a(builtin_functions.o)(.text+0x9a5): In function `mtx_unlock':
../../include/c11/threads_posix.h:248: undefined reference to `pthread_mutex_unlock'
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Static and shared builds were possible in the good old days
of static makefiles. Currently the build system does not
distinguish nor does anything special when one requests a
static build.
Print a warning message for the packager that static builds
are not supported and continue building shared libs.
Currently only Debian and derivatives use static build, and
they use it for building a Xlib powered libGL. This patch
will only change the warning message they are seeing but
the binaries produced will be identical.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
- As of commit cb080a10b68(configure.ac: Don't require shared LLVM when
building OpenCL) opencl does not mandate using shared llvm.
- Add a warning message that building with static llvm may cause
compilation problems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
- xf86dri.h is the old dri1 header, not required by dri2 nor dri3
- fold xf86drm.h inclusiong inside dri2.h
- dri3_glx does not have any drm specific dependencies
- glapi.h is not required by the dri2 and dri3 codepaths
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Recent commit fixed build issues in dri2_query_renderer.c by
wrapping in defined(direct_rendering) && !defined(applegl)
This patch targets the query_renderer tests, so that make check
passes on platforms such as hurd and cygwin.
v2: (Emil)
- Rebase and update commit message.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some platforms different library extension - dll, dylib, a.
Honor that when we are creating the required links.
Rename LIB_EXTENSION to LIB_EXT while we're here.
With libglapi linking aside, building classic drivers on
non-linux platforms should be possible now.
v2: Resolve conflicts.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
In the cases where one links against the static glapi.la there
is no need to create temporary variables only to explicitly
link agaist it.
Instead use SHARED_GLAPI_LIB to explicitly indicate when one
is building and linking with the shared glapi provider.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All the variables were used before the automake conversion
and do not make sense (nor are used) currently.
Replace GL_LIB_NAME with lib$(GL_LIB).$(LIB_EXTENSION) for
apple-glx. The build has been broken for ages, but this will
ease the recovery process as it happens.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
All three (xvmc and omx) do not have an alternative loading
similar to the dri modules. Thus one needs to explicitly install
them in order to use/test them.
v2:
- Keep vdpau targets, as an equivalent of LIBGL_DRIVERS_PATH
is being worked on.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This helper script will be used to minimise the duplication
during link generation across all gallium targets.
v2:
- Handle vdpau_LTLIBRARIES. Requested by Christian König.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little point in echoing everything that the script does
to stdout. Wrap it in AM_V_GEN so that a reasonable message is
printed as a indication of it's invocation.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
There is little gain in printing whenever a folder is created.
v2:
- Use $(AM_V_at) over @ to have control in verbose builds.
Suggested by Erik Faye-Lund.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jon TURNEY <jon.turney@dronecode.org.uk>
D3D10 allows setting of the internal offset of a buffer, which is
in general only incremented via actual stream output writes. By
allowing setting of the internal offset draw_auto is capable
of rendering from buffers which have not been actually streamed
out to. Our interface didn't allow. This change functionally
shouldn't make any difference to OpenGL where instead of an
append_bitmask you just get a real array where -1 means append
(like in D3D) and 0 means do not append.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The MESA_FORMAT_x enums in formats.h weren't declared in any sort
of reasonable order. Now it should be a little more logical.
This also required reordering tables in formats.c and s_texfetch.c
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Eric Anholt <eric@anholt.net>
There's no real reason to list all the formats in the comments.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Removes unnecessary MOV instructions in L4D2, TF2, Dota2, and many other
Steam games.
total instructions in shared programs: 1668126 -> 1657509 (-0.64%)
instructions in affected programs: 242235 -> 231618 (-4.38%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This keeps us from needing to reemit all the other stage state just
because a surface changed.
Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869%
(n=296). [v1]
v2:
- Drop binding table packets from Gen8 unit state as well.
- Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table,
cutting even more code.
v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2).
Signed-off-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> [v2, v3]
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
This makes both the empty and non-empty binding table paths exit through
the bottom of the function, which gives us a place to share code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Explicitly add radeon_drm_winsys_create and nouveau_drm_screen_create to
the dynamic list. This will ensure vdpau interop still works even when
the user links with -Bsymbolic-functions in hardened builds.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Rachel Greenham <rachel@strangenoises.org>
Reported-by: Peter Frühberger <peter.fruehberger@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This reverts most of commit d80f0c34b7. Upon a
closer reading, having the presumed offsets written is not enough to set the
flag. EXEC_OBJECT_NEEDS_GTT and/or EXEC_OBJECT_WRITE of the reloc entries
must also be set appropriately.
Rename
intel_winsys_check_aperture_size() to intel_winsys_can_submit_bo(),
intel_bo_exec() to intel_winsys_submit_bo(), and
intel_winsys_decode_commands() to intel_winsys_decode_bo().
Make a semantic change to ignore intel_context when the ring is not the render
ring.
The only alloc flag is INTEL_ALLOC_FOR_RENDER, which can as well be expressed
by specifying the initial write domain. The change makes it obvious that we
failed to set INTEL_ALLOC_FOR_RENDER in several places.
Rename
intel_bo_emit_reloc() to intel_bo_add_reloc(),
intel_bo_clear_relocs() to intel_bo_truncate_relocs(), and
intel_bo_references() to intel_bo_has_reloc().
Besides, we need intel_bo_get_offset() only to get the presumed offset afer
adding a reloc entry. Remove the function and make intel_bo_add_reloc()
return the presumed offset. While at it, switch to gem_bo->offset64 from
gem_bo->offset.
Patch adds a remap table for uniforms that is used to provide a mapping
from application specified uniform location to actual location in the
UniformStorage. Existing UniformLocationBaseScale usage is removed as
table can be used to set sequential values for array uniform elements.
This mapping helps to implement GL_ARB_explicit_uniform_location so that
uniforms locations can be reorganized and handled in a more easy manner.
v2: small fixes + rename parameters for merge and split functions (Ian)
improve documentation, remove old check for location bounds (Eric)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Nothing uses this structure, removal fixes Klocwork error about
the possible oom condition in _mesa_symbol_table_iterator_ctor.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This logic is borrowed from the radeon code. The transfer logic will
only get called for PIPE_BUFFER resources, so it shouldn't be necessary
to worry about them becoming render targets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
It's only used in this one file as far as I can tell, and exporting a
symbol named 'devices' from a shared library is a recipe for trouble.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When building out of tree, the file ends up dangling which
may result in a binary with the old git sha.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That structure member is a pointer, so the loop with
the Elements macro only freed up the first entry.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After preprocessing by glcpp all adjacent spaces were replaced by
single one and glsl parser received column-shifted shader source.
It negatively affected ast location set up and produced wrong error
messages for heavily-spaced shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
70e7905608.
build/linux-x86_64-debug/mesa/libmesa.a(driverfuncs.os): In function `_mesa_init_driver_functions':
src/mesa/drivers/common/driverfuncs.c:99: undefined reference to `_mesa_meta_GenerateMipmap'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
I don't know how many people care about this case, but it's easy enough
to do, so we may as well. The tricky part is that for some reason Mesa
stores the number of array slices in Height, not Depth.
I thought the easiest way to handle that here was to make Height = 1
(the actual height), and srcDepth = srcImage->Height. This requires
some munging when calling _mesa_prepare_mipmap_level, so I created a
wrapper that sorts it out for us.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is largely a matter of looping over the number of slices/layers,
and not minifying depth (presumably that code exists for the unfinished
3D texture support).
Normally, I would have made the loop over array slices the outermost
loop. I suspect that would make it trickier to support 3D textures
someday, though, so I didn't. The advantage is that we would only have
one BufferData call per slice, rather than one per miplevel and slice.
However, a GenerateMipmaps microbenchmark indicates that either way is
basically just as fast. So I'm not sure it's worth bothering.
Improves performance in a GenerateMipmaps microbenchmark by nearly 5x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Almost the exact same code appeared twice, and it needs to expand to
handle additional texture targets. Refactor it to tidy up the code and
avoid duplicating more work in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
KHR_debug
Also update dispatch sanity removing ARB_debug_output checks and
removing KHR_debug placeholders as the checks have already been added
V2: Make sure we exit case statements with conditional breaks rather than
just dropping through.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the a build breakage caused by
6974eb9076 on build configurations where
all the following are true:
1. radeonsi is not being built
2. r600g is being built
3. opencl is disabled
4. --enable-r600-llvm-compiler is not being used
5. libelf is not installed
v2:
- Add $(RADEON_CFLAGS) to libllvmradeon_la_CFLAGS
Tested-by: Brian Paul <brianp@vmware.com>
And move its definition into r600_pipe_common.h; This struct is a just
a container for shader code and has nothing to do with LLVM.
v2:
- Drop unrelated Makefile change
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
A user would have no idea what "_glthread_" is. This removes the
last remaining instance of the _glthread_ string in Mesa.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
To check that the st_mesa_format_to_pipe_format() and
st_pipe_format_to_mesa_format() functions correctly convert
all corresponding Mesa/Gallium formats.
This found that MESA_FORMAT_YCBCR_REV was missing in
st_mesa_format_to_pipe_format(). Fixed that too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: rotate in gen_rect_verts instead
v3: clear rotate in vl_compositor_clear_layers,
update calc_drawn_area as well
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Christian König <christian.koenig@amd.com>
It's never used, and it's equivalent to ralloc_parent(ht) if you really
need it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To match PIPE_FORMAT_R8G8B8A8_SRGB.
v2: fix component name copy&paste bugs
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We've had several problems now with FinishRenderTexture not getting called
enough, and we're ready to just give up on it ever doing what we need. In
particular, an upcoming Steam title had rendering bugs that could be fixed
by always_flush_cache=true.
Instead of hoping Mesa core can figure out when we need to flush our
caches, just track what BOs we've rendered to in a set, and when we render
from a BO in that set, emit a flush and clear the set.
There's some overhead to keeping this set, but most of that is just
hashing the pointer -- it turns out our set never even gets very large,
because cache flushes are so common (even on cairo-gl).
No statistically significant performance difference in cairo-gl (n=100),
despite spending ~.5% CPU in these set operations.
v1: (Original patch by Eric Anholt.)
v2: (Changes by Ken Graunke.)
- Rebase forward from May 7th 2013 -> March 4th 2014.
- Drop the FinishRenderTexture hook entirely; after rebasing the
patch, the hook was just an empty function.
- Move the brw_render_cache_set_clear() call from
intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush().
In theory, this could catch more cases where we've flushed.
- Consider stencil as a possible texturing source.
v3: (changes by anholt):
- Move set_clear() back to emit_mi_flush() -- it means we can drop
more forced flushes from the code. In the previous location, it
wouldn't have been called when we wanted pre-gen6.
- Move the set clear from batch init to reset -- it should be empty at
the start of every batch, since the kernel handled any inter-batch
flush for us.
v4: Drop the debug code in set.c that I accidentally committed.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Dylan Baker <baker.dylan.c@gmail.com> [v2]
With a non-debug build, gcc has two complaints:
1. 'found' var not used. Silence with '(void) found;'
2. 'id' not initialized. It's assigned by the UniformHash->get()
call, actually. But init it to zero to silence gcc.
Reviewed-by: Matt Turner <mattst88@gmail.com>
sizeof(scissor) returns the size of the full array rather than a single
element. Fix it to consider just the one element.
Fixes: 0705fa35 ("st/mesa: add support for GL_ARB_viewport_array (v0.2)")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The texture formats of winsys fbo are always linear becase the st manager
(st/dri for example) could not know the colorspace used. But it does not mean
that we cannot make the fbo sRGB-capable. By
- setting rb->Visual.sRGBCapable to GL_TRUE when the pipe driver supports the
format in sRGB colorspace,
- giving rb an sRGB internal format, and
- updating code to check rb->Format instead of strb->texture->format,
we should be good.
Fixed bug 75226 for at least llvmpipe and ilo, with no piglit regression.
v2: do not set rb->Visual.sRGBCapable for GLES contexts to avoid surprises
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75226
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
We need the header setup to not be predicated on which pixels are
undiscarded. I'm not sure originally if I had thought that the mask
disable implied predicate disable, or if I had just misread the mask
disable as predicate disable. Either way, I know I had spent more time
thinking about this in the gen8 generator than the gen7 generator.
Plus, it turns out that I had mis-implemented the "the GPU will use the
predicate unless this header is present" comment, by skipping setting up
the pixel mask when the header was present.
Fixes GPU hangs in piglit glsl-fs-discard-mrt, Trine, Trine 2 and
preusmably MLL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75207
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was really only used in the radeon driver for a debug printf.
And evidently, libGL.so referenced it just to work around some sort
of linker issue.
This patch removes the two calls to the function and the function
itself.
Fixes undefined _glthread_GetID symbol in libGL reported by 'nm'.
Though, the missing symbol doesn't cause any issues on my system but
it does cause glxinfo to fail on one of our test systems.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Before, it was kind of ugly to set the multisample fields with
assignments after we called _mesa_init_teximage_fields().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If scissor optimization is used (to avoid bringing scissored portions of
the render target into GMEM and then back out to system memory) in
combination with hw binning pass, the result would be a scissor mismatch
between binning pass and rendering pass. This would cause rendering
bugs in some scenarios with (for example) gnome-shell.
I would have expected that simply using the correct screen-scissor
during the binning pass would be enough, but seems like there is
something else missing. So for now disable binning pass if scissor
optimization is used.
Most of the logic refers to the local variable 'mt' directly but
a few cases use 'intelObj->mt' instead. These are the same for
now but will be different once stencil miptree gets used.
v2 (Ian): fixed also indentation in surrounding lines
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
On earlier hardware, we had to implement math in the shader to translate
Y-tiled or untiled coordinates to W-tiled coordinates (which is what
BLORP does today in order to texture from stencil buffers).
On Broadwell, we can simply state that it's W-tiled in SURFACE_STATE,
and adjust the pitch. This is much easier.
In the surface state code, I chose to handle the "should we sample depth
or stencil?" question separately from the setup for sampling from
stencil. This should make it work with the BindRenderbufferTexImage
hook as well, and hopefully be reusable for GL_ARB_texture_stencil8
someday.
v2: Update docs/GL3.txt (caught by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While the GL_ARB_stencil_texturing extension does not allow the creation
of stencil textures, it does allow shaders to sample stencil values
stored in packed depth/stencil textures.
Specifically, applications can call glTexParameter* with a pname of
GL_DEPTH_STENCIL_TEXTURE_MODE and value of either GL_DEPTH_COMPONENT or
GL_STENCIL_INDEX to select which component they wish to sample. The
default value is GL_DEPTH_COMPONENT (for traditional depth sampling).
Shaders should use an unsigned integer sampler (presumably usampler2D)
to access stencil data. Otherwise, results are undefined. Using shadow
samplers with GL_STENCIL_INDEX selected also is undefined behavior.
This patch creates a new gl_texture_object field, StencilSampling, to
indicate that stencil should be sampled rather than depth. (I chose to
use a boolean since I figured it would be more convenient for drivers.)
It also introduces the [Get]TexParameter code to get and set the value,
and of course the extension plumbing.
v2: Also consider textures incomplete when sampling stencil with
non-NEAREST min/mag filters (caught by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The loader infrastructure for everything but DRI2 requires that udev be
present, so we can figure out an appropriate driver from the fd. We don't
have a portable solution yet, but presumably it will have similar lookup
based on the device node.
It will also be even more required for krh's udev-based hwdb support,
which lets us have a loader that actually loads DRI drivers not included
in the loader's source distribution.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75212
Reviewed-by: Matt Turner <mattst88@gmail.com>
Because in draw we always inject position at slot 0 whenever
fragment shader would take the maximum number of inputs (32) it
meant that we had PIPE_MAX_ATTRIBS + 1 slots to translate, which
meant that we were crashing with fragment shaders that took
the maximum number of attributes as inputs. The actual max number
of attributes we need to translate thus is PIPE_MAX_ATTRIBS + 1.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
draw_current_shader_* functions return a final output when considering
both the geometry shader and the vertex shader. But when code generating
vertex shader we can not be using output slots from the geometry shader
because, obviously, those can be completely different. This fixes a
number of very non-obvious crashes.
A side-effect of this bug was that sometimes the vertex shading code
could save some random outputs as position/clip when the geometry
shader was writing them and vertex shader had different outputs at
those slots (sometimes writing garbage and sometimes something correct).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
From OpenGL 3.3 spec, page 141:
"Textures with a base internal format of DEPTH_COMPONENT or DEPTH_STENCIL
require either depth component data or depth/stencil component data.
Textures with other base internal formats require RGBA component data.
The error INVALID_OPERATION is generated if one of the base internal
format and format is DEPTH_COMPONENT or DEPTH_STENCIL, and the other
is neither of these values."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From OpenGL 4.0 spec, page 398:
"The initial internal format of a texel array is RGBA
instead of 1. TEXTURE_COMPONENTS is deprecated; always
use TEXTURE_INTERNAL_FORMAT."
Fixes Khronos OpenGL CTS test failure: proxy_textures_invalid_size
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Starting with llvm-3.5svn r202574, LLVM expects C+11 mode.
commit f8bc17fadc8f170c1126328d203f0dab78960137
Author: Chandler Carruth <chandlerc@gmail.com>
Date: Sat Mar 1 06:31:00 2014 +0000
[C++11] Turn off compiler-based detection of R-value references, relying
on the fact that we now build in C++11 mode with modern compilers. This
should flush out any issues. If the build bots are happy with this, I'll
GC all the code for coping without R-value references.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@202574 91177308-0d34-0410-b5e6-96231b3b80d8
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
`--enable-llvm-shared-libs` option was recently renamed as
`--with-llvm-shared-libs`, but several error messages still mention the
old option, causing confusing.
Trivial.
GetCurrentThread() returns a pseudo-handle (a constant which only makes
sense when used within the calling thread) and not a real handle.
DuplicateHandle() will return a real handle, but it will create a new
handle every time we call. Calling DuplicateHandle() here means we will
leak handles, which can cause serious problems.
In short, the Windows implementation of thrd_t needs a thorough make
over, and it won't be pretty. It looks like C11 committee
over-simplified things: it would be much better to have seperate objects
for threads and thread IDs like C++11 does.
For now, just comment out the thrd_current() implementation, so we get
build errors if anybody tries to use it.
Thanks to Brian Paul for spotting and diagnosing this problem.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
u_thread_self() expects thrd_current() to return a unique numeric ID
for the current thread, but this is not feasible on Windows.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
r600_translate_colorformat is rewritten to look like radeonsi.
r600_translate_colorswap is shared with radeonsi.
r600_colorformat_endian_swap is consolidated.
This adds some formats which were missing. Future "plain" formats will
automatically be supported.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
With commit 0432aa064b(configure: use shared-glapi when more than one
gl* API is used) we removed "disable shared-glapi when building without
dri" hunk.
In the good old days of classic mesa, dri and xlib-glx were mutually
exclusive thus the hunk made sense.
Currently enable-dri is used as a synonym for a range of things thus
it's more appropriate to handle xlib-glx explicitly.
Fixes a missing symbol '_glapi_Dispatch' in a xlib powered libGL,
build using the following
./autogen.sh --enable-xlib-glx --disable-dri --with-gallium-drivers=swrast
Cc: Brian Paul <brianp@vmware.com>
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The _glthread_LOCK/UNLOCK_MUTEX() macros are just wrappers around
the c11 mutex functions. Let's start getting rid of those wrappers.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the comments for the packed formats to accurately reflect the
layout of the bits in the pixel. For example, for the packed format
MESA_FORMAT_R8G8B8A8, R is in the least significant position while A
is in the most-significant position of the 32-bit word.
v2: also fix MESA_FORMAT_A1B5G5R5_UNORM, per Roland.
The spec incorrectly used void as return type, when it should have
been GLboolean. This has now been fixed. According to Nvidia, their
implementation always used GLboolean.
Reviewed-by: Christian König <christian.koenig@amd.com>
This is just a simple implementation that stores the extra values into the DRIimage
struct and just uses the fd importer. I haven't looked into what is required
to import YUV or deal with the extra parameters.
Signed-off-by: Dave Airlie <airlied@redhat.com>
i965 recently moved debug printfs to use stderr, including ones which
trigger on MESA_GLSL=dump. This resulted in scrambled output.
For drivers using ir_to_mesa, print_program was already using stderr,
yet all the code around it was using stdout.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
opt_algebraic was translating lrp(x, 0, a) into add(x, -mul(x, a)).
Unfortunately, this references "x" twice, which is invalid in the IR,
leading to assertion failures in the validator.
Normally, cloning IR solves this. However, "x" could actually be an
arbitrary expression tree, so copying it could result in huge piles
of wasted computation. This is why we avoid reusing subexpressions.
Instead, transform it into mul(x, add(1.0, -a)), which is equivalent
but doesn't need two references to "x".
Fixes a regression since d5fa8a9562, which isn't in any stable
branches. Fixes 18 shaders in shader-db (bastion and yofrankie).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The logic to count number of block outputs was out of sync with the
actual array construction. But to simplify / make things less fragile,
we can just allocate the arrays for worst case size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A value may be assigned on only one side of an if/else. In this case we
can simply substitute a mov.f32f32.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add option to generate fragment shader to emulate two sided color.
Additional inputs are added to shader for BCOLOR's (on corresponding to
each COLOR input). CMP instructions are used to select whether to use
COLOR or BCOLOR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
If vertex writes pointsize, there are a few extra bits we need to turn
on in the cmdstream here and there.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have the infrastructure for shader variants, add support to
generate an optimized shader for hw binning pass (with varyings/outputs
other than position/pointsize removed). This exposes the possibility
that the shader uses fewer constants than what is bound, so we have to
take care to not emit consts beyond what the shader uses, lest we
provoke the wrath of the HLSQ lockup!
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Fixes anything that tries to use gl_FrontFacing/gl_FragCoord. Also,
face support is needed to emulate two sided color.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
An unused input might not have a register assigned. We don't want bogus
regid to result in impossibly high max_reg..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
BRW_MAX_TEX_UNIT is the static limit on the number of textures we
support per-stage, not in total.
Core's `Unit` array is sized by MAX_COMBINED_TEXTURE_IMAGE_UNITS, which
is significantly larger, and across the various shader stages, up to
ctx->Const.MaxCombinedTextureImageUnits elements of it may be actually
used.
Fixes invisible bad behavior in piglit's max-samplers test (although
this escalated to an assertion failure on HSW with texture_view, since
non-immutable textures only have _Format set by validation.)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes glGetTexImage() when converting from MESA_FORMAT_Z32_FLOAT_S8X24_UINT
to GL_UNSIGNED_INT_24_8. Hit by the piglit
ext_packed_depth_stencil-getteximage test.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Though it won't matter on Linux, use _mesa_align_free to release it.
Since i965 doesn't have sys_buffer, I overlooked this in the
GL_ARB_map_buffer_alignment work a few months ago. Fixes i915 (and
presumably i830) regressions in ARB_map_buffer_range tests and the
failure in arb_map_buffer_alignment-sanity_test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74960
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There's no reason to have more vertex texture units than fragment
texture units on this hardware. Since increasing the default maximum
number of texture units from 16 to 32, this has triggered some segfault
in i915 driver. There's probably some array or bitfield that isn't
properly sized now. This really papers over the bug, but I don't think
I'll lose any sleep over that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74071
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: vec4_visitor::pack_uniform_registers(): Use correct comparison in the
assert, this->uniforms is already adjusted. Compare the actual value used to
index uniform_size and uniform_vector_size instead.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Don't add function parameters, pass the required size in
prog_data->nr_params.
v3:
- Use the name uniform_array_size instead of uniform_param_count.
- Round up when dividing param_count by 4.
- Use MAX2() instead of taking the maximum by hand.
- Don't crash if prog_data passed to vec4_visitor constructor is NULL
v4: Rebase for current master
v5 (idr): Trivial whitespace change.
Signed-off-by: Petri Latvala <petri.latvala@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71254
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLX can be either dri or xlib based, while enable_dri is
used in a variety of contexts.
With enable_dri_glx the context is clearly visible.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
All hardware drivers including the virtual vmwgfx require
the drm pipe-loader in order to be properly loaded by xa,
gbm and opencl.
Note this does _not_ add support for the above three it only
allows the pipe driver to be loaded by the library.
Eg. GBM will now properly open the pipe-i915 driver, should
one be working on the such hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75453
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Building to provide accelration using swrast does not make
sense.
Note: update your build script to explicitly mention svga
in the gallium drivers list, if you are building the vmwgfx
xa library.
v2: Update error message to provide more clarify, add an example.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Recent patch converted our logic to use test -n and test -z.
An emptry string variable (empty_str="") return true for both
thus making the check unreliable.
Fix this by correctly setting the variable when applicable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The issue is caused by a thinko that an empty string will be
considered of zero length by 'test'. This is not the case,
thus we were building the 'core' of megadrivers even when no
classic drivers were built.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
glGetTexImage(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8) was just
using memcpy() instead of _mesa_unpack_uint_24_8_depth_stencil_row()
to convert texels from the hardware format to the GL format.
Fixes issue reported by David Meng at Intel. The new piglit
ext_packed_depth_stencil-getteximage test checks for this bug.
Also, add some format/type assertions. We don't yet handle the
GL_FLOAT_32_UNSIGNED_INT_24_8_REV type. That should be fixed in
a follow-on patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
API is always API_OPENGL_COMPAT (since commit 4e4a537ad5,
"meta: Push into desktop GL mode when doing meta operations."),
so most of these checks do nothing.
We could instead check save->API to only bother setting/restoring
relevant GL state, but I'm not sure saving a few _mesa_set_enable
calls is worth the complexity. My understanding is the point of
the ctx->API guards was to avoid raising GL errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_meta_begin(), we switch to API_OPENGL_COMPAT, then munge a lot
of state (including some that doesn't exist in the actual API - like
PolygonStipple in API_OPENGL_CORE).
It seems reasonable that in _mesa_meta_end(), we should restore it,
then switch back to the original API. This at least makes it symmetric.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Some formats can't be handled - in particular cannot handle ints/uints formats,
which lack the pack_rgba_float/unpack_rgba_float functions. Instead of trying
to call these (and crash) return an error (I'm not sure yet if we should try
to translate such formats too here might not make much sense).
v2: suggested by Jose, use separate checks for pack/unpack of rgba_8unorm and
rgba_float functions (right now if one exists the other should as well).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently only two VUE map layouts: one for Gen4-5, and one
for everything else. We keep having to add new "case N+1" labels for
every new hardware generation, and so far it's always been the same.
This patch makes it so we only have to do work in the case where
something actually changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec's 3D workarounds page, this is unnecessary on
shipping Haswell hardware, and was never necessary on Broadwell. It
unfortunately doesn't say anything about Baytrail.
The workaround database confirms those results for Ivybridge, Haswell,
and Broadwell. Baytrail is less clear - one page says it's necessary,
while the other says it isn't. For now, be conservative and leave it
enabled.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy to compare output between different cards, especially
for ones that you don't have (and/or not in the current machine).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This should pave the way to being able to use the compiler without a
context. Also leads to cleaner code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
By adding "#define gvec4 %svec4" to the top of our fragment shader, we
can write generic code without needing to specialize it to vec4, ivec4,
or uvec4 via asprintf.
This also makes the INT and UNSIGNED_INT merge function code identical,
so I combined those two cases.
It's not a big savings, but a little bit tidier.
v2: Rebase on Vinson's MSVC build fixes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes fbo-clear-formats GL_ARB_depth_texture on Ironlake, which
regressed since commit f128bcc7c2
("i965: Drop mt->levels[].width/height.") intel_miptree_copy_slice was
calling minify(.., 7) on a 2x2 texture with mt->first_level == 7.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75292
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tt's kind of a trap---calling do_common_optimization() after
lower_instructions() may cause opt_algebraic() to reintroduce
ir_triop_lrp expressions that were lowered, effectively defeating the
point. Because of this, nobody uses it.
v2: Delete more code (caught by Ian Romanick).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
When the vec4 backend encountered an ir_triop_lrp, it always emitted an
actual LRP instruction, which only exists on Gen6+. Gen4-5 used
lower_instructions() to decompose ir_triop_lrp at the IR level.
Since commit 8d37e9915a ("glsl: Optimize open-coded lrp into lrp."),
we've had an bug where lower_instructions translates ir_triop_lrp into
arithmetic, but opt_algebraic reassembles it back into a lrp.
To avoid this ordering concern, just handle ir_triop_lrp in the backend.
The FS backend already does this, so we may as well do likewise.
v2: Add a comment reminding us that we could emit better assembly if we
implemented the infrastructure necessary to support using MAC.
(Assembly code provided by Eric Anholt).
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75253
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Similar to u_blitter, u_upload_mgr is now a client of the pipe context. Its
creation needs to be delayed until the context has been (almost) initialized.
The sort priorites for GLX_SAMPLES and GLX_SAMPLE_BUFFERS are
not defined in GL_ARB_multisample, but they are defined in
the GLX 1.4 specification.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The default values for GLX_DRAWABLE_TYPE and GLX_RENDER_TYPE are
GLX_WINDOW_BIT and GLX_RGBA_BIT respectively, as specified in
the GLX 1.4 specification.
This fixes the glx-choosefbconfig-defaults piglit test.
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This variable is no longer needed after the cleanup to the
code prior to the first arrays of array series
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On my gentoo system, llvm libs are in /usr/lib64/llvm, and llvm-config
--ldflags does not provide the rpath (it does, of course, provide a -L).
This adds the llvm dir to the rpath. It should be harmless if the path
is a system path, and should make things work when it's not.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
This will be used for changing texture properties without modifying
pipe_resource like r600g, but not in this series. For now, this change
allows consolidation of pipe_surface functions.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
db_z_info was unused. This just renames the variable to match the register
name.
Now, db_depth_info is unused on Evergreen.
Both variables will be needed on SI though.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
OpenGL allows a buffer to be mapped only once, but we also map buffers
internally, e.g. in the software primitive restart fallback, for PBOs,
vbo_get_minmax_index, etc. This has always been a problem, but it will
be a bigger problem with persistent buffer mappings, which will prevent
all Mesa functions from mapping buffers for internal purposes.
This adds a driver interface to core Mesa which supports multiple buffer
mappings and allows 2 mappings: one for the GL user and one for Mesa.
Note that Gallium supports an unlimited number of buffer and texture
mappings, so it's not really an issue for Gallium.
v2: fix unmapping in xm_dd.c, remove the GL errors there
v3: fix the intel driver (by Fredrik)
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
v2: dropped the error that DYNAMIC_STORAGE is required for MAP_WRITE_BIT,
the error is removed in the latest revision of GL 4.4
This adds support for GL_ARB_texture_gather, and one step of
support for GL_ARB_gpu_shader5.
This adds support for passing the TG4 instruction, along
with non-constant texture offsets, and tracking them for the
optimisation passes.
This doesn't support native textureGatherOffsets hw, to do that
you'd need to add a CAP and if set disable the lowering pass,
and bump the MAX offsets to 4, then do the i0,j0 sampling using
those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to gallium for a TG4 instruction,
and two CAPs. The first CAP is required for GL_ARB_texture_gather.
The second CAP is required to expose GL_ARB_gpu_shader5.
However so far we haven't found any hardware that natively
exposes the textureGatherOffsets feature from GL, so just
lower it for now. If hardware appears for this we can add
another CAP to allow TG4 to take 4 offsets.
v2: add component selection src and a cap to say
hw can do it. (st can use to help control
GL_ARB_gpu_shader5/GLSL 4.00). Add docs.
v3: rename to SM5, add docs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lowering pass will be useful for gallium drivers as well, in order to support
the GL TG4 oddity that is textureGatherOffsets.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The offsets will be stored in the handles parameter. This makes
it possible to use sub-buffers.
v2:
- Style fixes
- Add support for constant sub-buffers
- Store handles in device byte order
v3:
- Use endian helpers
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Current automake build does not try to resolve undefined
symbols thus we could end up with a broken library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With the introduction of the pipe_loader_sw_probe_dri helper we
require the sw/dri winsys during linking stage despite it being
unused by any of the targets. This will cause a minor increase
in the resulting library which will be cleaned up via linker
options with upcoming patches.
v2: Link with libswdri.la only when available.
Reported-and-tested-by: Tom Stellard <thomas.stellard@amd.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
While looking at bug 75356, I've noticed that the presence of
x11 egl platform pulls in sw/xlib as "needed" but fails to
report so at the end of configure.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above function implies using the the xlib winsys, which
has additional library dependencies that should not be forced.
Make the software xlib pipe loader optional thus avoid all
the dependency hell. A user that wishes to use the particular
pipe-loader would need to set the following within configure.ac.
enable_gallium_xlib_loader=yes
v2:
- Wrap sw/xlib/xlib_sw_winsys.h to handle compilation on systems
lacking X11 headers. Spotted by Christian Prochaska.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75356
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The blitter is completely ignorant of MSAA buffer layouts, so any
attempt to use BLT paths with MSAA buffers is likely to break
spectacularly.
In most cases, BLORP handles MSAA blits, so we never hit this bug.
Until recently, it also wasn't worth fixing, since Meta couldn't handle
MSAA either, so there was nothing to fall back to. But now there is.
+143 piglit tests on Broadwell (which doesn't have BLORP support).
Surprisingly, three also start failing. Since non-IMS MSAA buffers
store samples in successive array slices, using the blitter ought to
access sample 0 and ignore the rest, which is apparently good enough for
a few not-very-picky Piglit tests. Presumably the meta replacement code
is still broken.
No Piglit changes on Ivybridge.
v2: Move the early return to the top of the function (suggested by
Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Using generic shaders caused a measurable fps drop, which was isolated to
use of full precision (vs half precision) output. This is an attempt to
regain that lost performance by using half precision solid/blit shaders
(when the output format is not float32).
Note: for the built-in shaders, I would not expect them to be register
starved. And in fact it is the solid frag shader that seems to have the
biggest impact. So I suspect you get double the pixel pipe units (or
half the cycles) when the output is half precision. So there may be
some gain to using half precision output for application shaders as
well, even though the rest of register usage is still full precision.
But for half precision to work for more complex shaders, we need to deal
with some constraints, like cat2 needing same precision for it's two src
registers. So for now it is not enabled by default except for the
built-in shaders.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Start putting in place infrastructure to deal with multiple shader
variants. Initially we'll use this for two sided color (frag) and
binning pass (vert) shaders. Possibly need for others later (such
as YUV vs RGB eglImage?).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Easier than making more extensive use of rpt, and the more compact
shaders seem to bring some bit of performance boost. (Perhaps repeat
flag benefits are more than just instruction cache, possibly it saves
on instruction decode as well?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Instead in the common code, construct these shaders from TGSI. For now
we let a2xx keep it's hand coded shaders, as it's compiler isn't quite
up to the job yet. All the same it is a net drop in code size and gets
rid of special cases.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Make things configurable, and tweak the API a bit to avoid an extra
tgsi_shader_scan(). Getting closer to something generic which can be
moved out of freedreno and shaderd by other drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the headers.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the one provided by the spec.
Currently both versions are identical, but that is not
guaranteed to be the case in the future.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
... over the version number provided by the headers.
Explicitly set extension members to improve clarity.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note the member function releaseTexBuffer was added without
bumping spec version, and currently no drivers implement it.
v2: releaseTexBuffer was introduced by version 3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While we want to be able to print to stdout for glsl_compiler, for
debugging drivers we want to be able to dump to stderr because that's
where other driver debug (like LIBGL_DEBUG) tends to go, and because some
apps actually close stdout to shut up their own messages (such as the X
Server, or NWN).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was only going to get worse when tesselation shows up, and was
causing too much extra duplication in my stderr changes coming up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Unfortunately there's only one RT_ARRAY_MODE setting for all
attachments, so clears were previously truncated to the minimum number
of layers any attachment had. Instead set the RT_ARRAY_MODE to 512 (the
max number of layers) before doing the clear. This fixes
gl-3.2-layered-rendering-clear-color-mismatched-layer-count.
Also fix clears of individual layered rt/zeta, in case it ever happens.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Use tex->bo_format instead of zs->format in ilo_blitter_rectlist_clear_zs()
because the latter may be combined depth/stencil format. hiz_can_clear_zs()
is no-op for GEN7+, but move the GEN check so that the assertions are tested.
Finally, call the fast depth clear function from ilo_clear().
Using this emit function implicitly creates three copies, which
is pointlessly inefficient.
1. Code creates the original instruction.
2. Calling emit(fs_inst) copies it into the function.
3. It then allocates a new fs_inst and copies it into that.
The second could be eliminated by changing the signature to
fs_inst(const fs_inst &)
but that wouldn't eliminate the third. Making callers heap allocate the
instruction and call emit(fs_inst *) allows us to just use the original
one, with no extra copies, and isn't much more of a burden.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These functions (modulo emit_lrp, necessitating the small fix-up) pass
these arguments by value unmodified to other functions. No point in
making an additional copy.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C99 spec says the type of an enum is implementation defined (but can
be char, signed int, or unsigned int). gcc appears to always give enums
four bytes, even when they can fit in less. It does so because this is
what other compilers seem to do [0] and therefore to maintain ABI
compatibility with them.
gcc has an -fshort-enum flag that tells the compiler to use only as much
space as needed for an enum. Adding __attribute__((__packed__)) to an
enum definition has the same behavior, but on a per-enum basis.
brw_reg_type and register_file are not part of the ABI, so we can safely
mark them as PACKED so that they'll take only a byte, rather than four.
[0] http://gcc.gnu.org/onlinedocs/gcc/Non-bugs.html#index-fshort-enums-3868
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this SCons build error introduced with commit
4f37e52f37.
Compiling src/gallium/targets/libgl-xlib/xlib.c ...
src/gallium/targets/libgl-xlib/xlib.c:35:42: fatal error: state_tracker/xlib_sw_winsys.h: No such file or directory
#include "state_tracker/xlib_sw_winsys.h"
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75347
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
v2: Handle null_sw_create failure, add missing function return type
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
Will be used in the following commits.
v2: Link gallium tests against the library.
v3: Handle dri_create_sw_winsys failure
v4: Rebase on top of the targets/xa changes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v2)
Will be used in the upcoming patches.
v2: handle xlib_create_sw_winsys failure, drop unneeded header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> (v1)
The sw pipe-loader implicitly handles winsys_create, thus we
it would make sense to implicitly destroy it upon releasing
the loader.
Currently we leak the sw_winsys when releasing the pipe-loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We don't need an array mapping the shader index to "sampler2DMS",
"isampler2DMS", and so on. We can simply do "%ssampler2DMS" and pass in
vec4_prefix, which is "", "i", or "u".
This eliminates the use of C99 array initializers and should fix the
MSVC build.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75344
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_texture_object's Target field is never a cube face enumeration, so
target_to_target is just the identity function. Aptly named, at least.
I verified this by putting an assert(!"ZOMG, CUBES!") in the cube face
case, and running Piglit. Nothing ever hit it. Beyond that, I
inspected the code in mesa/main.
This could probably also be deleted from i915, but I haven't tested
there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes this SCons build error.
build/linux-x86_64-debug/mesa/libmesa.a(context.os): In function `init_attrib_groups':
src/mesa/main/context.c:815: undefined reference to `_mesa_init_pipeline'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fix build error introduced with commit f4c13a890f.
CC pixeltransfer.lo
main/pipelineobj.c: In function '_mesa_delete_pipeline_object':
main/pipelineobj.c:59:4: error: unknown type name 'unsinged'
unsinged i;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr):
* Trivial reformatting.
* Remove GL_COMPUTE_SHADER. Compute shaders don't participate in pipeline
objects anyway. Suggested by Matt Turner.
v3 (idr):
* Use _mesa_has_geometry_shaders.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was originally included in another patch, but it was split out by
Ian Romanick.
v2 (idr): Return early from _mesa_ActiveShaderProgram if
_mesa_lookup_shader_program_err returns an error. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v2]
This will allow the guts of the implementation to be shared with
_mesa_CreateShaderProgramv.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement IsProgramPipeline based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement GenProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Implement DeleteProgramPipelines based on the VAO code.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
V1:
* Extend gl_shader_state as pipeline object state
* Add a new container gl_pipeline_shader_state that contains
binding point of the previous object
* Update mesa init/free shader state due to the extension of
the attibute
* Add an init/free pipeline function for the context
V2:
* Rename gl_shader_state to gl_pipeline_object
* Rename Pipeline.PipelineObj to Pipeline.Current
* Formatting improvement
V3 (idr):
* Split out from previous uber patch.
* Remove '#if 0' debug printfs.
V4 (idr):
* Fix some errors in comments. Suggested by Jordan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Future patches will use this function outside shaderapi.c.
This was originally included in another patch, but it was split out by
Ian Romanick.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nothings implemented yet but glProgramUniform* which are mostly a
copy/paste of the older function glUniform*
I create dedicated pipelineobj.[ch] file that will contains function
related to the "new" pipeline container object.
V2: formatting improvement
V3:
* indentation fix
* Update copyright
* Add a comment on ProgramParameteri already present in another extension
* Remove TODO, will be readded on correct patch
V4 (idr):
* Fix dispatch_sanity unit test
* Make extension string available in core profiles (instead of just
compatibility).
* Trivial reformating
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
GL_ARB_separate_shader_objects adds the ability to specify location
layouts for interstage inputs and outputs.
In addition, this extension makes 'in' and 'out' generally available for
shader inputs and outputs. This mimics the behavior of
GL_ARB_explicit_attrib_location.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Current behaviour states that shared-glapi is usefull when building
with dri, which is not the case. Shared-glapi is used to dispatch
the gl* functions across the one or more gl api's which can be dri
based but do not need to be.
Fixed the following build
./configure --enable-gles2 --disable-dri --enable-gallium-egl \
--with-egl-platforms=fbdev --with-gallium-drivers=swrast
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75098
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit ee55500c22a(configure: cleanup classic dri drivers handling)
cleaned up the logic handling autodetection of dri drivers, but missed
the case when one can explicitly disable dri, and still request opengl.
Fixes build issues for the following
./autogen.sh --disable-dri --with-gallium-drivers=swrast
While we're here, explicitly clear with_dri_drivers whenever building
without such drivers to prevent choking later on.
v2: Simplify with_dri_drivers handling.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75126
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Compared to i965, the code generated doesn't use the AVG instruction. But
I'm not sure that multisampled integer resolves are really that important
to worry about.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are non-stretched, non-resolving blits, so it's just a matter of
sampling once from our gl_SampleID and storing that to our color/depth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're disabling GL_MULTISAMPLE, so we didn't need to worry about a lot of
that state. But to do MSAA to MSAA blits, we need to start handling more
state.
v2: Fix pasteo caught by Kenneth.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Blending of values would occur when doing GL_LINEAR filtering with
scaling, and in an upcoming commit when doing MSAA resolves.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that this doesn't handle GL_EXT_multisample_scaled_blit yet. The
i965 code for that extension bakes in knowledge of the sample positions
(well, knowledge of the sample positions aligned to a lower-resolution
grid), which we would have to do at runtime somehow for meta.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We haven't been executing this code before the meta-blit case, because
we've been flagging the miptree as validated at texstorage time, and never
having to revalidate.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Identified by Valgrind memory check. Initialized block-opaque in a
different patch. This test seems unnecessary. If opaque must be true,
just set to true.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Define some additional convenience operators, clean up the
implementation slightly, and rename it to 'intrusive_ptr' for reasons
that will be obvious in the next commit.
Tested-by: Tom Stellard <thomas.stellard@amd.com>
I'd neglected to port these to Broadwell. Most of this code is copy
and pasted from Gen7, but instead of using F32TO16/F16TO32, we just
use MOV with HF register types.
Fixes fs-packHalf2x16 and fs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell removed the F32TO16 and F16TO32 instructions. However, it has
actual support for HF values, so they're actually just MOV.
Fixes vs-packHalf2x16 and vs-unpackHalf2x16 tests (both the ARB
extension and ES 3.0 variants).
v2: Emulate F32TO16's align16 zeroing bug, since Chad's front end code
relies on it happening. We can probably refactor this code to be
better later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_init_state() calls brw_upload_initial_gpu_state(). If hardware
contexts are enabled (brw->hw_ctx != NULL), this will upload some
initial invariant state for the GPU. Without hardware contexts, we
rely on this state being uploaded via atoms that subscribe to the
BRW_NEW_CONTEXT bit.
Commit 46d3c2bf4d accidentally moved
the call to brw_init_state() before creating a hardware context.
This meant brw_upload_initial_gpu_state would always early return.
Except on Gen6+, we stopped uploading the initial GPU state via
state atoms, so it never happened.
Fixes a regression since 46d3c2bf4d.
Cc: "10.0 10.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make sure that both the Gen4 and Gen7 style messages work, I
initially disabled the SHADER_OPCODE_GEN7_SCRATCH_READ optimization,
ran Piglit, re-enabled it, and ran Piglit again. Both worked fine.
Fixes 40 Piglit tests (most of the varying-packing category).
v2: Move num_regs assertion from gen8_fs_generator to
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new accessors will make it easy to do Gen7-style scratch messages.
v2: Move num_regs assertion from gen8_fs_generator into
gen8_set_dp_scratch_message() (suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the past, 3DSTATE_PS took an absolute number of threads. Conversely,
on Broadwell you always program 64, and it implicitly scales based on
the GT-level with no special programming. So, I stored 64 in
brw_device_info::max_wm_threads.
However, I didn't realize that we also use max_wm_threads to compute the
size of the scratch space buffer. In that case, we really need the
absolute number of threads.
This patch hardcodes 3DSTATE_PS to use the value it expects, and changes
max_wm_threads back to a (completely fake) absolute thread count (once
again copied from Haswell).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Broadwell, g0.5 contains the "Scratch Space Pointer"; using OR
puts some bits of that into "ignored" sections of our message header.
While this doesn't hurt, it's also not terribly /useful/. Using MOV
is sufficient to set the only interesting bits in this part of the
message header.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the latest documentation, any PIPE_CONTROL with the
"Command Streamer Stall" bit set must also have another bit set,
with five different options:
- Render Target Cache Flush
- Depth Cache Flush
- Stall at Pixel Scoreboard
- Post-Sync Operation
- Depth Stall
I chose "Stall at Pixel Scoreboard" since we've used it effectively
in the past, but the choice is fairly arbitrary.
Implementing this in the PIPE_CONTROL emit helpers ensures that the
workaround will always take effect when it ought to.
Apparently, this workaround may be necessary on older hardware as well;
for now I've only added it to Broadwell as it's absolutely necessary
there. Subsequent patches could add it to older platforms, provided
someone tests it there.
v2: Only flag "Stall at Pixel Scoreboard" when none of the other bits
are set (suggested by Ian Romanick).
v3: Prefix the function with "gen8" (requested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Eric Anholt <eric@anholt.net>
Grab the parsed invocation count, check for consistency
during linking, and finally save the result in
gl_shader_program Geom.Invocations.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
_mesa_glsl_parse_state in_qualifier->invocations will store the
invocations count.
v3:
* Use in_qualifier to allow the primitive to be specied
separately from the invocations count (merge_qualifiers)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We introduce a new merge_in_qualifier ast_type_qualifier
which allows specialized handling of merging input layout
qualifiers.
By merging layout qualifiers into state->in_qualifier, we
allow multiple input qualifiers. For example, the primitive
type can be specified specified separately from the
invocations count (ARB_gpu_shader5).
state->gs_input_prim_type is moved into state->in_qualifier->prim_type
state->gs_input_prim_type_specified is still processed separately
so we can determine when the input primitive is specified. This
is important since certain scenerios are not supported until after
the primitive type has been specified in the shader code.
v4:
* Merge with compute shader input layout qualifiers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Improves performance of a dolphin emulator trace I had laying around by
3.60131% +/- 0.995887% (n=128).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We generate steaming piles of these for the centroid workaround, and this
quickly cleans them up.
total instructions in shared programs: 1591228 -> 1590047 (-0.07%)
instructions in affected programs: 26111 -> 24930 (-4.52%)
GAINED: 0
LOST: 0
(Improved apps are l4d2, csgo, and dolphin)
Reviewed-by: Matt Turner <mattst88@gmail.com>
The previous code relied on cpu denorm support for converting small float
formats (such r11g11b10_float and r16_float) to floats, otherwise denorms
are flushed to zero. We worked around that in llvmpipe blend code by
reenabling denorms, but this did nothing for texture sampling. Now it would
be possible to reenable it there too but I'm not really a fan of messing
with fpu flags (and it seems we can't actually do it reliably with llvm in
any case looking at some bug reports). (Not to mention if you actually have
a lot of denorms in there, you can expect some order-of-magnitude slowdown
with x86 cpus.)
So instead use code which adjusts exponents etc. directly hence not relying
on cpu denorm support for the rescaling mul.
(We still need the fpu flag handling as we can't do float-to-smallfloat
without using cpu denorms at least for now - I actually wanted to keep
both the old and new code and using one or the other depending on from where
it's called but that didn't work out as the parameter would have to be passed
through too many layers than I'd like.)
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
We need to advertise 8x, 4x, and 2x multisamples. Previously, we only
claimed to support 0/1 samples.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I can't find any documentation to explain what ought to be done here, so
I simply guessed based on the pattern I observed in the 4x/8x cases.
It appears to work, but it could be totally wrong.
I was able to find the Sandybridge PRM quote from the comments in the
latest documentation: Shared Functions > 3D Sampler > Multisampled
Surface Behavior. However, it only mentions 4x MSAA - not even 8x.
After a substantial amount more digging, I was able to find a second
page (incorrectly tagged) which confirmed the formulas in our code for
8x MSAA. However, that page didn't mention 2x MSAA at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
According to the "Point Multisample Rasterization" of the OpenGL
specification (3.0 or later), smooth points are supposed to be enabled
implicitly when multisampling, regardless of the GL_POINT_SMOOTH flag.
However, if GL_POINT_SPRITE is enabled, you get square points no matter
what. Core contexts always enable point sprites, so this effectively
makes smooth points go away, even in the case of multisampling.
Fixes Piglit's EXT_framebuffer_multisample/point-smooth tests.
(Yes, that's right folks, we actually have Piglit tests for this.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The meaning and effects of this bit are surprisingly complicated.
See Rasterization > Windower > Multisampling > Multisample ModesState.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This restriction carries forward from earlier platforms. The code is
ported straight from gen7_wm_state.c.
v2: Actually do it right.
v3: Add missing _NEW_MULTISAMPLE bit (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: Also set the "oMask Present to Render Target" bit, which is required
for shaders that write oMask. Otherwise the hardware won't expect
the extra data.
v3: Add missing _NEW_MULTISAMPLE (caught by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I made a few changes which I think simplify the code a bit compared to
the Gen7 implementation, but which are largely pointless.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already set the number of samples, but were missing the MSAA layout
mode. Reusing gen7_surface_msaa_bits makes it easy to set both.
This also lets us drop the Gen8 surface_num_multisamples function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The enumerations are just log2(num_samples) shifted by 3, which we can
easily compute via ffs().
This also makes it reusable for Broadwell, which has 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These enumerations are simply log2 of the number of multisamples shifted
by a bit, so we can calculate them using ffs() in a lot less code.
Suggested by Eric Anholt.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The const in
const unsigned foo(void);
is meaningless. Removing it silences this warning:
src/glsl/ast_to_hir.cpp:1802:56: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
From page 14 (page 20 of the PDF) of the GLSL 1.10 spec:
"In addition, all identifiers containing two consecutive underscores
(__) are reserved as possible future keywords."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Names simply containing __ are dangerous to use, but should
be allowed.
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Section 3.3 (Preprocessor) of the GLSL 1.30 spec (and later) and the
GLSL ES spec (all versions) say:
"All macro names containing two consecutive underscores ( __ ) are
reserved for future use as predefined macro names. All macro names
prefixed with "GL_" ("GL" followed by a single underscore) are also
reserved."
The intention is that names containing __ are reserved for internal use
by the implementation, and names prefixed with GL_ are reserved for use
by Khronos. Since every extension adds a name prefixed with GL_ (i.e.,
the name of the extension), that should be an error. Names simply
containing __ are dangerous to use, but should be allowed. In similar
cases, the C++ preprocessor specification says, "no diagnostic is
required."
Per the Khronos bug mentioned below, a future version of the GLSL
specification will clarify this.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "9.2 10.0 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Cc: Tapani Pälli <lemody@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Bugzilla: Khronos #11702
Linking with LLVM static libraries is easily broken by changes to
the llvm-config program or when LLVM adds, removes, or changes library
components. Keeping up with these changes requires a lot of maintanence
effort to keep the build working on the master and stable branches.
Also, because of issues in the past LLVM static libraries, the release
manager is currently configuring with --with-llvm-shared-libs when
checking the build before release. Enabling shared libraries by
default would allow the release manager to run ./configure with
no arguments, and be reasonably confident that the build would succeed.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Useful because the total number of uniform components might exceed
MAX_UNIFORMS * 4 in some cases because of the image metadata we'll be
passing as push constants.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Like the VEC4 back-end does. It will make dynamic allocation of the
param_size array easier in a future commit.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since we are now consuming two ringbuffers at a time, we probably want a
pool larger than 4.. but we don't need each individual ringbuffer to be
so large, so offset the pool size increase by reducing rb size.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
It seems the write-after-read hazard that applies to texture fetch
instructions, also applies to sfu instructions.
Also, cat5/cat6 instructions do not have a (ss) bit, so in these
cases we need to insert a dummy nop instruction with (ss) bit set.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
There doesn't seem to be any reason for it to be a method, and it's
surprising that the expression 'reg.retype(t)' doesn't retype its
object but rather it creates a temporary with the new type. Use
'retype(reg, t)' instead.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add assertion that the register is not in the HW_REG or IMM file,
calculate the conjunction of the old and new mask instead of replacing
the old [consistent with the behavior of brw_writemask(), causes no
functional changes right now], make it static inline to let the
compiler do a slightly better job at optimizing things, and shorten
its name.
v2: Assert that the new writemask is not zero to avoid undefined
hardware behaviour.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And define non-mutating helper functions to retype fixed and normal
regs with a common interface. At some point we may want to get rid of
::fixed_hw_reg completely and have fixed regs use the normal register
data members (e.g. backend_reg::reg to select a fixed GRF number,
src_reg::swizzle to store the swizzle, etc.), I have the feeling that
this is not the last headache we're going to get because of the
multiple ways to represent the same thing and the different register
interface depending on the file a register is stored in...
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There doesn't seem to be any reason for nr_params, nr_pull_params,
param, and pull_param to be duplicated in the stage-specific
subclasses of brw_stage_prog_data. Moving their definition to the
common base class will allow some code sharing in a future commit, the
removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and
the simplification of the stage-specific brw_*_prog_data_compare.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell's 3DSTATE_WM_HZ_OP packet makes this much easier.
Instead of programming the whole pipeline, we simply have to emit the
depth/stencil packets, a state override, and a pipe control. Then
arrange for the state to be put back. This is easily done from a single
function.
v2: Use minify(mt->logical_{width,height}0, level) in 3DSTATE_WM_HZ_OP
instead of intel_mipmap_level's width/height fields. Those were
based on the physical width/height, and thus wrong for MSAA buffers.
Eric also deleted those fields.
v3: Use 0xFFFF as the sample mask regardless of what the user set (as
this operation is unrelated); set the drawing rectangle to the
miplevel being operated on, rather than the whole surface; remove
unnecessary MAX2(..., 1) around mt->logical_depth0 (all suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code followed the vtable function signature, which is not a
great fit: many of the parameters are unused, and the function still
inspects global state, making it less reusable.
This patch refactors the depth buffer packet emission code into a new
function which takes exactly the parameters it needs, and which uses no
global state. It then makes the existing vtable function call the new
one.
Ideally, we would remove the vtable function, and clean up that
interface. But that can happen once HiZ is working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's "HiZ Resolve" operation still has the restriction that the
rectangle primitive must be 8x4 aligned. So I believe we still need
this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ buffers still don't exist, but when they do, we'll set them up.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_depthbuffer_format is not very reusable at the moment, since it
uses global state (ctx->DrawBuffer) to access a particular depth buffer.
For HiZ on Broadwell, I need a function which simply converts the
formats. However, at least one existing user of brw_depthbuffer_format
really wants the existing interface. So, I've created a new function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 1456ed85f0.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
This reverts commit 498d10e230.
_eglInitResource can and is supposed to be called on subclass objects.
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Even with the other limits raised, TestProxyTexImage would still reject
textures > 1GB in size. This is an artificial limit; nothing prevents
us from having a larger texture. I stayed shy of 2GB to avoid the
larger-than-aperture situation.
For 3D textures, this raises the effective limit:
- RGBA8: 645 -> 738
- RGBA16: 512 -> 586
- RGBA32F: 406 -> 465
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Gen4+ supports 8192x8192 cube maps. Ivybridge and later can actually
support 16384, but that would place GL_MAX_CUBE_MAP_TEXTURE_SIZE above
GL_MAX_TEXTURE_SIZE, which seems like a bad idea.
(Unfortunately, we can't bump GL_MAX_TEXTURE_SIZE to 16384 without
causing regressions due to awful W-tiled stencil buffer interactions.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74130
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ARB_ES2_compatibility doesn't say anything about shader linking
when one of the shaders (vertex or fragment shader) is absent. So,
the extension shouldn't change the behavior specified in GLSL
specification.
Tested the behavior on proprietary linux drivers of NVIDIA and AMD.
Both of them allow linking a version 100 shader program in OpenGL
context, when one of the shaders is absent.
Makes following Khronos CTS tests to pass:
successfulcompilevert_linkprogram.test
successfulcompilefrag_linkprogram.test
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This drops the MOVs for header setup, which are totally mis-scheduled.
total instructions in shared programs: 1590047 -> 1589331 (-0.05%)
instructions in affected programs: 43729 -> 43013 (-1.64%)
GAINED: 0
LOST: 0
glb27-trex:
x before
+ after
+-----------------------------------------------------------------------------+
| + x xx + + + |
| ++ + xxx ++x xx + ** *x+ + + + x * |
|+x xx x* x+++xx*x*xx+++*+*xx++** *x* x+***x*+xx+* + * + + *|
| |__|__________MA___A___________|___| |
+-----------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 49 62.33 65.41 63.49 63.53449 0.62757822
+ 50 62.28 65.4 63.7 63.6982 0.656564
No difference proven at 95.0% confidence
Reviewed-by: Matt Turner <mattst88@gmail.com>
It often confused people because it was unclear on whether it was the
physical or logical, and people needed the other one as well. We can
recompute it trivially using the minify() macro, clarifying which value is
being used and making getting the other value obvious.
v2: Fix a pasteo in intel_blit.c's dst flip.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since only window system renderbuffers can have a singlesample_mt, this
lets us drop a bunch of sanity checking to make sure that we're just a
renderbuffer-like thing.
v2: Fix a badly-written comment (thanks Kenneth!), drop the now trivial
helper function for set_needs_downsample.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only DRI2 vs DRI3 delta was just how to decide about frontbuffer-ness
for doing the upsample.
v2: Fix missing singlesample_mt->region->name update in the merged code,
which would have broken the DRI2 don't-recreate-the-miptree
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pretty silly to pass in values dereferenced out of one of the arguments.
v2: Get the destination size from the dst, even though the callers are
always dealing with src size == dst size cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
So far it's happened to be that we're only ever calling
intel_miptree_blit() (up/downsampling) from the ReadBuffer, but I stumbled
over a null ReadBuffer case when debugging later parts of the series.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly mt->target == 2D_MS just results in a few checks that we don't try
to allocate multiple LODs and don't try to do slice copies with them. But
with the introduction of binding renderbuffers to textures, we need more
consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us simplify our shaders, and rely on GLES-prohibited
functionality (like ARB_texture_multisample) when writing these
driver-internal functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes:
xa_tracker.c: In function 'xa_tracker_create':
xa_tracker.c:147:5: error: implicit declaration of function 'pipe_loader_drm_probe_fd' [-Werror=implicit-function-declaration]
in some build configurations, as XA now implicitly depends on
gallium_drm_loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Fixes radeonsi emitting command streams to the kernel even when there
have been no draw calls before a flush, potentially powering up the GPU
needlessly.
Incidentally, this also cuts the runtime of piglit gpu.py in about half
on my Kaveri system, probably because an X11 client going away no longer
always results in a command stream being submitted to the kernel via
glamor.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65761
Cc: "10.1" mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Required for libdrm 2.4.37 and earlier. Both scons and automake
require version 2.4.38 now so that guard is not longer needed.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
xorg-server and libkms is no longer required since the removal
of the xorg state-tracker.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is the first version that introduced DRM_CAP_PRIME, which is
implicitly required by egl/wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
* Make sure that only drivers that are handled by configure.ac
are included in DRI_DIRS.
* Change with_dri_drivers default value to auto, and set enable
autodetection, when enable_opengl is on.
v2: Move "test" to the correct location.
v3: Squash DRI_DIRS handling before the switch statement.
Suggested by Ilia Mirkin
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both x86_64|amd64 and *bsd, already set the full range of available
classic dri drivers. Drop the explicit assignment, and fall back to
the generic default.
Keep explicit list from plafroms/arches that do not handle the default
list.
Update help strings, to explicitly mention "classic" for applicable
DRI drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move all the cases within one switch statement and handle
i9{1,6}5 and r{adeon,200} independently.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If built without llvm, the following error occurs with mplayer:
Failed to open VDPAU backend .../libvdpau_r600.so: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
[vo/vdpau] Error when calling vdp_device_create_x11: 1
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
DRM_API_HANDLE_TYPE_SHARED is zero, so doesn't actually fix anything.
But we shouldn't rely on SHARED handle type being zero.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Build two versions of pipe-loader, with only the client version linking
in x11 client side dependencies. This will allow the XA state tracker
to use pipe-loader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Seems texture sample instructions don't immediately consume there
src(s). In fact, some shaders from blob compiler seem to indiciate that
it does not even count the texture sample instructions when calculating
number of delay slots to fill for non-sample instructions. (Although so
far it seems inconclusive as to whether this is required.)
In particular, when a src register of a previous texture sample
instruction is clobbered, the (ss) bit is needed to synchronize with the
tex pipeline to ensure it has picked up the previous values before they
are overwritten.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Was supposed to be a '+', otherwise we end up with a negative offset and
choosing registers below the assigned range.
This seems to fix the scheduling mystery "solved" by adding in extra
delay slots.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since 'kill' does not produce a result, the new compiler was happily
optimizing them out. We need to instead track 'kill's similar to
outputs. But since there is no non-predicated kill instruction,
(and for flattend if/else we do want them to be predicated), we need
to track the topmost branch condition on the stack and use that as src
arg to the kill. For a kill at the topmost level, we have to generate
an immediate 1.0 to feed into the cmps.f for setting the predicate
register.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Thanks to figuring out 32bit float render target, and adding regdump
test in fdre-a3xx, I can more easily play around with instructions to
figure out range of inputs/outputs/etc. And from this I can conclude
that cmps.f works more like expected and I can do something much more
simple in trans_cmp() (compared to before which was more closely
emulating the instruction sequence of the blob compiler).
And using sel.b32 (binary 0/1) often makes more sense than sel.f32
(+/- float) or sel.u32 (+/- uint) as it can use the output directly
from cmps.f without needing the 'add.s tmp0, tmp0, -1'.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As documented, the _mesa_free_shader_program_data function:
"Frees all the data that hangs off a shader program object, but not
the object itself."
This means that this function may be called multiple times on the same object,
(and has been observed to). Meanwhile, the shProg->Label field was not being
set to NULL after its free(). This led to a second call to free() of the same
address on the second call to this function.
Fix this by setting this field to NULL after free(), (just as with all other
calls to free() in this function).
Reviewed-by: Brian Paul <brianp@vmware.com>
CC: mesa-stable@lists.freedesktop.org
The linux winsys needs to know whether a surface is shared.
For guest-backed surfaces we need this information to avoid allocating a
mob out of the mob cache for shared surfaces, but instead allocate a shared
mob, that is never put in the mob cache, from the kernel.
Also previously, all surfaces were given the "shareable" attribute when
allocated from the kernel. This is too permissive for client-local surfaces.
Now that we have the needed info, only set the "shareable" attribute if the
client indicates that it needs to share the surface.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations, it may be desirable to bypass the cache at buffer
creation but to insert the buffer in the cache at buffer destruction.
One such situation is where we already have a kernel representation of a
buffer that we want to use, but we also want to insert it in the cache when
it's freed up.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
In some situations it's important to restrict the sizes of buffers that the
cached buffer manager is allowed to return
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
This adds new interface functions for guest-backed surfaces and
adds a mobid parameter to the surface_relocation() function.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
The old svga3d_reg.h file is split into separate header files and we
add new items for guest-backed surfaces.
Plus some minor code fixes because of renamed symbols.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Cc: "10.1" <mesa-stable@lists.freedesktop.org>
Commit f4ebcd133b ("dri/nouveau: NV17_3D class is not available for
NV1a chipset") fixed this partially by using the correct 3d class.
However there were a lot of checks left over comparing against the
chipset.
Reported-and-tested-by: John F. Godfrey <jfgodfrey@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2 (chk): fix eos handling
v3 (leo): implement scaling configuration support
v4 (leo): fix bitrate bug
v5 (chk): add workaround for bug in Bellagio
v6 (chk): fix div by 0 if framerate isn't known,
user separate pipe object for scale and transfer,
always flush the transfer pipe before encoding
v7 (chk): make suggested changes, cleanup a bit more,
only advertise encoder on supported hardware
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
v2: add fw version query
v3: add README.VCE
v4: avoid error msg when kernel doesn't support it
Signed-off-by: Christian König <christian.koenig@amd.com>
Commit 246ca4b001 ("nv50: implement multiple viewports/scissors, enable
ARB_viewport_array") added dirty tracking to scissors/viewports. However
it neglected to mark them all as dirty on a context switch. This fixes
an apparent regression in webgl in chrome, but probably in any
application that switches contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The bound range is disconnected from the viewport dimensions. This is
the relevant bit from glViewportArray:
"""
The location of the viewport's bottom left corner, given by (x, y) is
clamped to be within the implementaiton-dependent viewport bounds range.
The viewport bounds range [min, max] can be determined by calling glGet
with argument GL_VIEWPORT_BOUNDS_RANGE. Viewport width and height are
silently clamped to a range that depends on the implementation. To query
this range, call glGet with argument GL_MAX_VIEWPORT_DIMS.
"""
Just set it to +/-16384, as that is the minimum required by
ARB_viewport_array and the value that all current drivers provide.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This avoids a CopyTexImage() on Intel i965 hardware without blorp.
v2: Move the !readAtt check up higher.
v3: Rebase on idr's changes, plus readAtt check is totally gone, and also
fix a typo in a comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
This will let us use meta's acceleration from renderbuffers without having
to do a CopyTexImage first.
This is like what we do for TFP, but just taking an existing renderbuffer
and binding it to a texture with whatever its format was. The
implementation won't work for stencil renderbuffers, and it only does
non-texture renderbuffers (but then, if you're using a texture
renderbuffer, you can just pull the texture object/level/slice out of the
renderbuffer, anyway).
v2: Don't forget to propagate NumSamples to the teximage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function is only handling the color case. We can just unindent as
long as we're willing to do the check for the bit outside of the
function.
v2: Rebase on idr's changes, drop readAtt check that's always non-null
anyway (it's a pointer into to the statically-allocated attachments
array in the renderbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I want split some meta.c code off to a separate file, so these functions
can't be static any more.
v2: Rebase on idr's changes, also expose setup_blit_shader,
blit_shader_table_cleanup, setup_vertex_objects,
setup_ff_tnl_for_blit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
I'd like to split some of our code to separate files, since 4k lines and
growing is pretty unreasonable for all these separate operations.
v2: Rebase on idr's changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
There was this funny argument passed to setup for "did alloc decide we
need to allocate new texture storage?", which goes away if we don't have
the caller do alloc as a separate step.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_ARB_fbo spec:
If the source and destination buffers are identical, and the
source and destination rectangles overlap, the result of the blit
operation is undefined.
As far as I know, that's the only thing that would have been of concern
for this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I tripped over one of these when debugging meta, and it's a lot nicer to
just see the internalFormat being complained about.
v2: Drop a note in the other errors path that there is one early return.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
While these structs are generated per GLSL sampler type, they're structs
of data-about-shaders (notably, the ID of a shader program), not
data-about-samplers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Everyone was just immediately calling it and doing nothing else with the
shader program id.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The only thing that wants to track the glsl_sampler structure is the
shader string generator.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the VEC4 back-end agrees on src_reg::swizzle being one of the
BRW_SWIZZLE macros defined in brw_reg.h, except in two places where we
use Mesa's SWIZZLE macros. There is even a doxygen comment saying
that Mesa's macros are the right ones. They are incompatible swizzle
representations (3 bits vs. 2 bits per component), and the code using
Mesa's works by pure luck. Fix it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using ::subreg_offset. Remove the
less flexible alternative and define a convenience function to keep
the fs_reg interface sane.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The same effect can be achieved using a combination of ::stride and
::subreg_offset. Remove the less flexible ::smear to keep the data
members of fs_reg orthogonal.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Some improvements for copy propagation with non-contiguous
register strides and mismatching types.
v3: Add example of the situation that the copy propagation changes are
intended to avoid. Clarify that 'fs_reg::apply_stride()' is expected
to work with zero strides too.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It would be nice if we could have a single 'reg_offset' field
expressed in bytes that would serve the purpose of both, but the
semantics of 'reg_offset' are quite complex currently (it's measured
in units of one, eight or sixteen dwords depending on the register
file and the dispatch width) and changing it to bytes would be a very
intrusive change at this stage. Add a separate 'subreg_offset' field
for now.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
To fix MSVC compile breakage. Evidently, _restrict is an MSVC keyword,
though the docs only mention __restrict (with two underscores).
Note: we may want to also rename _volatile to volatile_flag to be
consistent.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74900
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix formatting, add new comments, get rid of extraneous indentation.
Suggested by Ian in bug 74723.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Set driver-specified flag in NewDriverState when glUniform* is
used to bind an image unit.
v3: Abbreviate argument type check.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Because of the combinatorial explosion of different image built-ins
with different image dimensionalities and base data types, enumerating
all the 242 possibilities would be annoying and a waste of .text
space. Instead use a special path in the built-in builder that loops
over all the known image types.
v2: Generate built-ins on GLSL version 4.20 too. Rename
'_has_float_data_type' to '_supports_float_data_type'. Avoid
duplicating enumeration of image built-ins in create_intrinsics()
and create_builtins().
v3: Use a more orthodox approach for passing image built-in generator
parameters.
v4: Cosmetic changes.
Acked-by: Paul Berry <stereotype441@gmail.com>
No opaque types may be statically initialized in the shader, all
opaque variables must be declared uniform or be part of an "in"
function parameter declaration, no opaque types may be used as the
return type of a function.
v2: Add explicit check for opaque types in interface blocks. Check
for opaque types in ir_dereference::is_lvalue().
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Aggregating images inside uniform blocks is explicitly disallowed by
the standard, aggregating them inside structures is not (as of GL
4.4), but there is a similar problem as with atomic counters: image
uniform declarations require either a "writeonly" memory qualifier or
an explicit format qualifier, which are explicitly forbidden in
structure member declarations. In the resolution of Khronos bug
#10903 the same wording applied to atomic counters was decided to mean
that they're not allowed inside structures -- Rejecting image member
declarations within structures seems the most reasonable option for
now.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Add comment next to the read_only and write_only qualifier flags.
Change temporary copies of the type qualifier mask to use uint64_t
too.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Add predicates to query if a GLSL type is or contains an image.
Rename sampler_coordinate_components() to coordinate_components().
v2: Use assert instead of unreachable.
v3: No need to use a separate code-path for images in
coordinate_components() after merging image and sampler fields in
the glsl_type structure.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Reuse the glsl_sampler_dim enum for images. Reuse the
glsl_type::sampler_* fields instead of creating new ones specific
to image types. Reuse the same constructor as for samplers adding
a new 'base_type' argument.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This fixes bug 73200 "vdpau-GL interop fails due to different screen
objects" in the same way radeon does.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Only export __driDriverExtensions by default, and radeon_drm_winsys_create on radeons.
Remove -Bsymbolic which should no longer be needed.
As a side effect, it ought to fix a manifestation of bug 73200 on radeon.
Signed-off-by: Maarten Lankhorst<maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixed piglit test getteximage-targets S3TC CUBE_ARRAY on systems that
don't have libtxc_dxtn installed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will be necessary to support cubemap array textures because they
use all four components.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rectangle textures were not necessary for mipmap generation (because
they cannot have mipmaps), but all of the future users of this common
code will need to support rectangle textures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is quite like code we want for blits. Pull it out so that it can
be shared by other paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow the same table of shader-per-sampler-type to be used for
paths in meta other than just mipmap generation. This is also the
reason the declarations of the structures was moved towards the top of
the file.
v2: Code formatting change suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also... glOrtho(-1.0, 1.0, -1.0, 1.0, -1.0, 1.0) *is* the identity
matrix, so drop the unnecessary call to _mesa_Ortho.
v2: Rename setup_ff_TNL_for_blit() to setup_ff_tnl_for_blit(). Seems
silly to capitalize one out of two to three acronyms in the name
(change by anholt, acked by idr).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
I set the "address modify enable" bit in the wrong DWord. The first
DWord is the high 16 bits of the address, while the second is the low
32-bits and enable bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never set it on previous generations, but I had to set it in
3DSTATE_PS for correct behavior. For symmetry, I set it in 3DSTATE_VS
as well, but there's no actual need to do so. Piglit works fine either
way. The documentation also remarks that there should never be a need
to program this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
My earlier patch (i965: Reserve space for "Vertex Count" in GS outputs.)
incremented Global Offset for most URB writes to make room for the new
"Vertex Count" field, but failed to shift the URB writes used for
writing control bits.
Confusingly, Global Offset must be incremented by 2 here, rather than 1.
The URB writes we use for actual data are HWord writes, which treat
Global Offset as a 256-bit offset. These are OWord writes, so it's
treated as a 128-bit offset instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Fixes Piglit's max-samplers test.
v2: Use get_element_ud() for the destination as well as the source.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I added support for these on Haswell, but forgot to update the Broadwell
code before landing it. Partially fixes Piglit's max-samplers test.
v2: Use get_element_ud() consistently, rather than using it for the
source but using brw_vec1_grf for the destination..
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MOV_RAW disables masking, but doesn't force the instruction to be
uncompressed. That needs to be done by hand.
Fixes textureGather and texture offset tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Almost every driver already supported it. All current and future
Gallium drivers always support it, and most existing classic drivers
support it.
This only changes radeon and nouveau.
This extension only adds data types that can be passed to, for example,
glTexImage2D. It does not add internal formats. Since you can already
pass GL_FLOAT to glTexImage2D this shouldn't pose any additional issues
with those drivers. Note that r200 and i915 already supported this
extension, and they don't support floating-point textures either.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Half-float TexBOs should require both GL_ARB_half_float_pixel and
GL_ARB_texture_float. This doesn't matter much in practice. Every
driver that supports GL_ARB_texture_buffer_object already supports
GL_ARB_half_float_pixel. We only expose the TexBO extension in core
profiles, and those require GL_ARB_texture_float.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function '_mesa_meta_CopyTexSubImage':
drivers/common/meta.c:3744:52: warning: unused parameter 'rb' [-Wunused-parameter]
Unfortunately, the parameter can't just be removed because it is part of
the dd_function_table::CopyTexSubImage interface.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
drivers/common/meta.c: In function 'setup_drawpix_texture':
drivers/common/meta.c:1572:30: warning: unused parameter 'texIntFormat' [-Wunused-parameter]
setup_drawpix_texture has never used this paramater. Before the
refactor commit 04f8193aa it was used in several locations. After that
commit, texIntFormat was only used in alloc_texture.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the other meta routines have a particular pattern for creating
and tracking the VAO and VBO. This one function deviated from that
pattern for no apparent reason.
Almost all of the code added in this patch will be removed shortly.
v2: Drop glDeleteBuffers() of the old, now-uninitialized vbo variable.
Fixes getteximage-formats and fbo-mipmap-copypix regression when "2"
landed in the variable (change by anholt).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Final intermediate step leading to some code sharing. Note that the new
GemerateMipmap and decompress vertex structures are the same as the new vertex
structure in BlitFramebuffer and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new DrawPixels
vertex structure is the same as the new vertex structure in BlitFramebuffer
and the others.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new Clear
vertex structure is the same as the new BlitFramebuffer and CopyPixels
vertex structure.
The "sizeof(float) * 7" hack is temporary. It will magically disappear
in a just a couple more patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step leading to some code sharing. Note that the new CopyPixels
vertex structure is the same as the new BlitFramebuffer vertex
structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Initial step of cleaning the exported symbols from targets/omx
- Mark omx_component_library_Setup as public
v2: Keep export-symbols-regex
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
There is no cpp files during the build process, thus we
can safely drop the unused cxxflags.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The function pointer is retrieved via VdpGetProcAddress just
like all the other vdpau functions and should not be exported.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Add VISIBILITY_CFLAGS to automake build, so that
only required symbols are exported.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This symbol is internal and was never part of the API.
Unused by any of the gbm backends, it makes sense to
simply not export it.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
VISIBILITY_CFLAGS
Currently the library exports every symbol imaginable,
rather than the ones defined by the API.
Note: This may cause issues for libraries that are linking
agaist libgbm's internals.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Currently we create a OPENGL_COMPAT context regardless of
what was requested by the program. Correct that by retaining
the program's request and passing it into _mesa_initialize_context.
Based on a similar commit for radeon/r200 by Ian Romanick.
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Add the explicit note about the required version during configure.
Require the same version (151) of udev when building the pipe-loader.
Mention the udev version requirement in GBM Requires.private.
v2: Resolve a couple of silly typos. Spotted by Ilia
v3: Cleanup platfrom/platform typo. Spotten by Stefan
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
As of recently we dlopen the library, additionally the only
code that is including the libudev.h header, is the loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously the linking was required due to dependency of udev in the
pipe-loader. Now this is no longer the case, as we dlopen the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
--enable-gallium-llvm is required by radeonsi. Currently we
check only for LLVM_VERSION_INT which is 0, whenever gallium-llvm
is disabled explicitly.
./configure --with-gallium-drivers=r600,radeonsi --disable-gallium-llvm
v2: Correct typo in error message. Spotted by Tom Stellard
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Matt Turner noted the incorrect order, but I somehow forgotten to
change it before pushing upstream. The other one is a typo during rebase.
Signed-off-by: Christian König <christian.koenig@amd.com>
If we don't recognize the PCI ID, we can't reasonably load the driver.
However, calling abort() is quite rude - it means the application that
tried to initialize us (possibly the X server) can't continue via
fallback paths. We already have a more polite mechanism - failing to
create the context. So, just use that.
While we're at it, improve the error message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73024
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lu Hua <huax.lu@intel.com>
Consider a multithreaded program with two contexts A and B, and the
following scenario:
1. Context A calls initialize(), which allocates mem_ctx and starts
building built-ins.
2. Context B calls initialize(), which sees mem_ctx != NULL and assumes
everything is already set up. It returns.
3. Context B calls find(), which fails to find the built-in since it
hasn't been created yet.
4. Context A finally finishes initializing the built-ins.
This will break at step 3. Adding a lock ensures that subsequent
callers of initialize() will wait until initialization is actually
complete.
Similarly, if any thread calls release while another thread is still
initializing, or calling find(), the mem_ctx/shader would get free'd while
from under it, leading to corruption or use-after-free crashes.
Fixes sporadic failures in Piglit's glx-multithread-shader-compile.
Bugzilla: https://bugs.freedesktop.org/69200
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.1 10.0" <mesa-stable@lists.freedesktop.org>
In the first case, we can simply call stride(mask, 16, 8, 2) rather than
creating a new register with a different stride, then immediately
changing it a second time.
In the second case, the stride was already what we wanted, so we can
just use mask without any changes at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
stride(brw_vec1_reg(...) ...) takes some register, changes the strides,
then changes the strides again. Let's do it once.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
this just ties the mesa code to the pre-existing gallium interface,
I'm not sure what to do with the CSO stuff yet.
0.2: fix min/max bounds
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds GS output and FS input support, even though FS input
support isn't supported until GLSL 4.30 from what I can see.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There are only two sensible placements for 2x MSAA samples - and one is
the mirror image of the other. I chose (0.25, 0.25) and (0.75, 0.75).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The 4x and 8x cases contained identical code for extracting the X and
Y sample offset values and converting them from U0.4 back to float.
Without this refactoring, we'd have to duplicate it a third time in
order to support 2x MSAA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Apparently some players are ill-prepared for us claiming that a decoder
exists only to have creating it fail, and express this poor preparation
with crashes (e.g. flash). Check that firmware is there to increase the
chances of there being a high correlation between reported capabilities
and ability to create a decoder.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 10.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
In commit eeed49f5f2, Mark accidentally
renamed MESA_FORMAT_S8_Z24 to MESA_FORMAT_Z24_UNORM_X8_UINT and
MESA_FORMAT_X8_Z24 to MESA_FORMAT_Z24_UNORM_S8_UINT, reversing their
sense. The commit message was correct, but what sed commands actually
got run didn't match that.
This patch swaps the two enum names, reversing them. This should undo
the damage, but might break things if people have manually fixed a few
instances in the meantime...
Mark's commit also failed to mention renames:
s/MESA_FORMAT_ARGB2101010_UINT\b/MESA_FORMAT_B10G10R10A2_UINT/g
s/MESA_FORMAT_ABGR2101010\b/MESA_FORMAT_R10G10B10A2_UNORM/g
but those seem okay.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nvfx_fragprog_assign_generic only allows for up to 10/8 texcoords for
nv40/nv30. This fixes compilation of the varying-packing tests.
Furthermore it appears that the last 2 inputs on nv4x don't seem to
work in those tests, so just report 8 everywhere for now.
Tested on NV42, NV44. NV4B appears to have additional problems.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 9.1 9.2 10.0 10.1 <mesa-stable@lists.freedesktop.org>
We don't need to allocate all the state related to GL_ARB_debug_output
until some aspect of that extension is actually needed.
The sizeof(gl_debug_state) is huge (~285KB on 64-bit systems), not even
counting the 54(!) hash tables and lists that it contains. This change
reduces the size of gl_context alone from 431KB bytes to 145KB bytes on
64-bit systems and from 277KB bytes to 78KB bytes on 32-bit systems.
Reviewed-by: Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The WHILE instruction doesn't have UIP. It only has JIP.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The bits which normally contain the source register descriptions
actually contain the JIP/UIP jump targets, which we already printed.
Interpreting JIP/UIP as source registers results in some really creepy
looking output, like IF statements with acc14.4<0,1,0>UD sources.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell's 3DSTATE_CLEAR_PARAMS packet expects a floating point value
regardless of format. This means we need to stop converting it to
UNORM.
Storing the value as float would make sense, but since we already have a
uint32_t field, this patch continues shoehorning it into that. In a
sense, this makes mt->depth_clear_value the DWord you emit in the
packet, rather than the clear value itself.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix pasteo of an extra abs being inserted (caught by many). Rewrite
to drop the silly switch statement.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
We've had various bug reports over the years where miptrees are missing,
and when I screwed it up while adding DRI2 to the modesetting driver, I
figured I should put the info necessary for debug here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes it work on Broadwell, too.
v2: Drop bogus double write to 3DPRIM_BASE_VERTEX register
(caught by Chris Forbes).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This saves some boilerplate and hides the OUT_RELOC/OUT_RELOC64
distinction.
Placing the function in intel_batchbuffer.c is rather arbitrary; there
wasn't really an obvious place for it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Since commit 9cee3ff562, INTEL_DEBUG=vs
has caused a NULL pointer dereference for fixed-function/ARB programs.
In the vec4 generators, "prog" is a gl_program, and "shader_prog" is the
gl_shader_program. This is different than the FS visitor.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Mesa fails to retain the precision qualifier when parsing:
#version 300 es
centroid in mediump vec2 v;
Consider how the parser's type_qualifier production is applied.
First, the precision_qualifier rule creates a new ast_type_qualifier:
<precision: mediump>
Then the storage_qualifier rule creates a second one:
<flags: in>
and calls merge_qualifier() to fold in any previous qualifications,
returning:
<flags: in, precision: mediump>
Finally, the auxiliary_storage_qualifier creates one for "centroid":
<flags: centroid>
it then does $$ = $1 and $$.flags |= $2.flags, resulting in:
<flags: centroid, in>
Since precision isn't stored in the flags bitfield, it is lost. We need
to instead call merge_qualifier to combine all the fields.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If st_GetTexImage() is to decompress the texture, avoid the fallback
path even if prefer_blit_based_texture_transfer = false. For drivers
that returned PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER = 0, we
were always taking the fallback path for texture decompression rather
than rendering a quad. The later is a lot faster.
Cc: "10.0" "10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In the specification text of NV_vertex_program1_1, the upper
limit of the RCC instruction is written as 1.884467e+19 in
scientific notation, but as 0x5F800000 in binary. But the binary
version translates to 1.84467e+19 rather than 1.884467e+19 in
scientific notation.
Since the lower-limit equals 2^-64 and the binary version equals
2^+64, let's assume the value in scientific notation is a typo
and implement this using the value from the binary version
instead.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLSurface by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLSurface,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_eglInitResource() was used to memset entire _EGLContext by
writing more than size of pointed target. This does work
as long as Resource is the first element in _EGLContext,
this patch fixes such dependency.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
If a driver enables ARB_gpu_shader5 and sets Const.MaxVertexSteams >= 4,
then piglit's arb_gpu_shader5-minmax test should now pass.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the tests they were the same so it didn't matter, but indications are
that this is the correct behaviour. Also take this opportunity to
(trivially) support using gl_Layer in fp.
Cc: 10.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
GLX_ARB_create_context allows making a GLX context current with None
drawable and readables, but this was never implemented correctly in GLX.
We would create a __DRIdrawable for the None GLX drawable and pass that
to the DRI driver and that would somehow work. Now it's somehow broken.
The way this should have worked is that we pass a NULL DRI drawable
to the DRI driver when the GLX user calls glXMakeContextCurrent()
with None for drawable and readables.
https://bugs.freedesktop.org/show_bug.cgi?id=74143
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Flush the context when we unmap a buffer, otherwise VDPAU might
start rendering the next frame while we still reference that buffer.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
Bugzilla (bug needs XBMC changes as well):
https://bugs.freedesktop.org/show_bug.cgi?id=73191
When VL uploads vertex buffers, it uses PIPE_TRANSFER_DONTBLOCK, which always
flushes the context in the winsys if the buffer being mapped is busy. Since
I added handling of DISCARD_RANGE, DONTBLOCK has had no effect when combined
with DISCARD_RANGE and I think the context isn't flushed anywhere else,
so no commands are submitted to the GPU until the IB is full, which takes
a lot of frames.
Using DISCARD_RANGE is not the only way to trigger this bug. The other way
is to reallocate the vertex buffer before every upload.
BTW, I'm not sure if this is the right place for flushing, but it does fix
the bug.
v2 (chk): move the flush to the right place.
Signed-off-by: Christian König <christian.koenig@amd.com>
Tested-by: StrangeNoises (rachel@strangenoises.org)
v2: This doesn't change the behavior. It only moves the tiling check
to r600_init_resource and removes the usage parameter.
Reviewed-by: Christian König <christian.koenig@amd.com>
Featuring a full grown MPEG2 and H264 decoder and a couple of hundred bugs.
v2 (Leo): fix an error for pic_order_cnt_type 1
v3 (Leo): implement support for field decoding
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
dri2_dup_image was not copying the dri_format field.
This was causing some bugs, for example:
. we create an gbm_bo.
. we get an EGLImage from the gbm_bo.
. Bug: impossible to get again the gbm_bo from the EGLImage by
importing. (gbm dri2 backend)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This regressed when I converted BRW_REGISTER_TYPE_* to be an abstract
type that doesn't match the hardware description. dump_instruction()
was using reg_encoding[] from brw_disasm.c, which no longer matches
(and was incorrect for Gen8+ anyway).
This patch introduces a new function to convert the abstract enum values
into the letter suffix we expect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Mesa now has a real, feature-rich EGL implementation on X11 via xcb.
Therefore I believe there is no longer a practical need for the egl_glx
driver.
Furthermore, egl_glx appears to be unmaintained. The most recent
nontrivial commit to egl_glx was 6baa5f1 on 2011-11-25.
Tested by running weston-smoke in windowed Weston on X with i965.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
ureg_program is allocated on the heap so we can just bump the
number of immediates that it can handle. It's needed for d3d10.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We need to handle a lot more immediates and in order to do that
we also switch from allocating this structure on the stack to
allocating it on the heap.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We only supported up to 256 immediates, which isn't enough. We had
code which was allocating immediates as an allocated array, but it
was always used along a statically backed array for performance
reasons. This commit adds code to skip that performance optimization
and always use just the dynamically allocated immediates if the
number of them is too great.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The number of allowed temporaries increases almost with every
iteration of an api. We used to support 128, then we started
increasing and the newer api's support 4096+. So if we notice
that the number of temporaries is larger than our statically
allocated storage would allow we just treat them as indexable
temporaries and allocate them as an array from the start.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If multiple color outputs are written, this shader is unlikely to be
useful with a winsys framebuffer.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The driver is supposed to ensure buffers before any drawing operation, but in
do_blit_drawpixels() and do_blit_copypixels() we inspect the buffer format
before calling intel_prepare_render(). That was covered up by the
unconditional call to intel_prepare_render() in intelMakeCurrent(), but we
now only do this on the initial intelMakeCurrent call for a context
(to get the size for the initial viewport values).
https://bugs.freedesktop.org/show_bug.cgi?id=74083
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexander Monakov <amonakov@gmail.com>
This will allow testing of compute shader functionality before it is
completed.
To enable ARB_compute_shader functionality in the i965 driver, set
INTEL_COMPUTE_SHADER=1.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupCount is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Document that the 3-element array MaxComputeWorkGroupSize is
indexed by dimension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adds MESA_SHADER_COMPUTE to the gl_shader_stage enum.
Also, where it is trivial to do so, it adds a compute shader case to
switch statements that switch based on the type of shader. This
avoids "unhandled switch case" compiler warnings.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Linker loops that iterate through all the stages in the pipeline need
to use MESA_SHADER_FRAGMENT as a bound, so that we can add an
additional MESA_SHADER_COMPUTE stage, without it being erroneously
included in the pipeline.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were really doing F2I. And also move it to generic section.
(Note that for llvmpipe the code generated is definitely bad, due to lack
of unsigned conversions with sse. I think though what llvm does (using scalar
conversions to 64bit signed either with x87 fpu (32bit) or sse (64bit)
including lots of domain changes is quite suboptimal, could do something like
is_large = arg >= 2^31
half_arg = 0.5 * arg
small_c = fptoint(arg)
large_c = fptoint(half_arg) << 1
res = select(is_large, large_c, small_c)
which should be much less instructions but that's something llvm should do
itself.)
This fixes piglit fs/vs-float-uint-conversion.shader_test (maybe more, needs
GL 3.0 version override to run.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This moves the value from the GS shader to the copy shader so the registers
are setup correctly.
fixes tests/spec/glsl-1.50/execution/geometry/clip-distance-out-values.shader_test
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
attempt to calculate a better value for array size to avoid breaking apps.
v2: use 0xfff like streamout, suggested by Grigori
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
cayman has a different end of program bit, so do that properly.
fixes hangs with geom shader tests on cayman.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
set the correct values so the misc out register is setup correctly
for the copy shader.
This also updates the state for the gs copy shader so the hw
gets programmed correctly.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just makes r600 and evergreen do what the radeonsi codepaths do
for layered rendering. This makes the 2d amd_vertex_shader_layer test
pass on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just adds support for emitting the proper value in the VS out misc.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This just enables the workarounds we have for vertex/pixel shaders
for geom shaders as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This follows what fglrx does, it unpacks the input we are
going to indirect into a bunch of registers and indirects
inside them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We need to be able to write to the ring using a base register
for when we emit vertices in a loop, in theory the SB compiler
could collapse these indirect writes to direct writes if the
register value is constant and known, but that is outside my
pay grade.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Vadim's code derived it from the info.mode, but it needs
to be takes from the geometry shader output primitive.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The instance cnt register was missing for a few kernels,
with a new enough kernel we can output it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
If the shader has no CF clauses at all emit an nop
If the last instruction is an ENDLOOP add a NOP for the LOOP to go to
if the last instruction is CALL_FS add a NOP
These fix a bunch of hangs in the geometry shader tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
SB needs fixes for three GS instructions it seems to raise
them outside loops etc despite my best efforts.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Although we don't use SB on geom shaders, the VS copy shader will use it
so we might as well implement MEM_RING support in sb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This can happen in normal operation, so don't report an error on it,
just continue.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is Vadim's initial work with a few regression fixes squashed in.
v2: (airlied)
fix regression in glsl-max-varyings - need to use vs and ps_dirty
fix regression in shader exports from rebasing.
whitespace fixing.
v2.1: squash fix assert
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
For geometry shaders we need to call this code from a second place.
Just move it out for now to keep future patches cleaner.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From the GLSL 4.40 spec, section 6.4 (Jumps):
The continue jump is used only in loops. It skips the remainder of
the body of the inner most loop of which it is inside. For while
and do-while loops, this jump is to the next evaluation of the
loop condition-expression from which the loop continues as
previously defined.
Previously, we incorrectly treated a "continue" statement as jumping
to the top of a do-while loop.
This patch fixes the problem by replicating the loop condition when
converting the "continue" statement to IR. (We already do a similar
thing in "for" loops, to ensure that "continue" causes the loop
expression to be executed).
Fixes piglit tests:
- glsl-fs-continue-inside-do-while.shader_test
- glsl-vs-continue-inside-do-while.shader_test
- glsl-fs-continue-in-switch-in-do-while.shader_test
- glsl-vs-continue-in-switch-in-do-while.shader_test
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In addition to making it public, we also need to change its first
argument from an ir_loop * to an exec_list *, so that it can be used
to insert the condition anywhere in the IR (rather than just in the
body of the loop).
This will be necessary in order to make continue statements work
properly in do-while loops.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is really not needed as blorp blit programs already sample
XRGB normally and get alpha channel set to 1.0 automatically by
the sampler engine. This is simply copied directly to the payload
of the render target write message and hence there is no need for
any additional blending support from the pixel processing pipeline.
The blending formula is anyway broken for color components, it
multiplies the color component with itself (blend factor is the
component itself).
Alpha blending in turn would not fix the alpha to one independent
of the source but simply used the source alpha as is instead
(1.0 * src_alpha + 0.0 * dst_alpha).
Quoting Eric:
"If we want to actually make the no-alpha-bits-present thing work,
we need to override the bits in the surface state or in the
generated code. In the normal draw path, it's done for sampling
by the swizzling code in brw_wm_surface_state.c, and the blending
overrides is just to fix up the alpha blending stage which
doesn't pay attention to that for the destination surface."
If one modifies piglit test gl-3.2-layered-rendering-blit to use
color component values other than zero or one, this change will
kick in on IVB. No regressions on IVB.
This is effectively revert of c0554141a9:
i965/blorp: Support overriding destination alpha to 1.0.
Currently, Blorp requires the source and destination formats to be
equal. However, we'd really like to be able to blit between XRGB and
ARGB formats; our BLT engine paths have supported this for a long time.
For ARGB -> XRGB, nothing needs to occur: the missing alpha is already
interpreted as 1.0. For XRGB -> ARGB, we need to smash the alpha
channel to 1.0 when writing the destination colors. This is fairly
straightforward with blending.
For now, this code is never used, as the source and destination formats
still must be equal. The next patch will relax that restriction.
NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This moves the intel_batchbuffer_flush before the drm_intel_bo_busy
call, which is a change in behavior. However, the old behavior was
broken.
In the future, we may want to only flush in the batchbuffer references
the BO being mapped. That's certainly more typical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This additionally measures the time stalled, while also simplifying the
code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mapping a buffer is a common place where we could stall the CPU.
In a few places, we've added special code to check whether a buffer is
busy and log the stall as a performance warning. Most of these give no
indication of the severity of the stall, though, since measuring the
time is a small hassle.
This patch introduces a new brw_bo_map() function which wraps
drm_intel_bo_map, but additionally measures the time stalled and reports
a performance warning. If performance debugging is not enabled, it
simply maps the buffer with negligable overhead.
We also add a similar wrapper for drm_intel_gem_bo_map_gtt().
This should make it easy to add performance warnings in lots of places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Hw binning pass doesn't seem to have broken anything. And optimizing
compiler fixes a lot of shaders and doesn't seem to break anything. So
re-org slightly FD_MESA_DEBUG params and make both hw binning and
optimizer enabled by default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The new compiler generates a dependency graph of instructions, including
a few meta-instructions to handle PHI and preserve some extra
information needed for register assignment, etc.
The depth pass assigned a weight/depth to each node (based on sum of
instruction cycles of a given node and all it's dependent nodes), which
is used to schedule instructions. The scheduling takes into account the
minimum number of cycles/slots between dependent instructions, etc.
Which was something that could not be handled properly with the original
compiler (which was more of a naive TGSI translator than an actual
compiler).
The register assignment is currently split out as a standalone pass. I
expect that it will be replaced at some point, once I figure out what to
do about relative addressing (which is currently the only thing that
should cause fallback to old compiler).
There are a couple new debug options for FD_MESA_DEBUG env var:
optmsgs - enable debug prints in optimizer
optdump - dump instruction graph in .dot format, for example:
http://people.freedesktop.org/~robclark/a3xx/frag-0000.dot.pnghttp://people.freedesktop.org/~robclark/a3xx/frag-0000.dot
At this point, thanks to proper handling of instruction scheduling, the
new compiler fixes a lot of things that were broken before, and does not
appear to break anything that was working before[1]. So even though it
is not finished, it seems useful to merge it in it's current state.
[1] Not merged in this commit, because I'm not sure if it really belongs
in mesa tree, but the following commit implements a simple shader
emulator, which I've used to compare the output of the new compiler to
the original compiler (ie. run it on all the TGSI shaders dumped out via
ST_DEBUG=tgsi with various games/apps):
163b6306b1
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For the time being, keep old compiler as fallback for things that the
new compiler does not support yet. Split out as it's own commit to make
the later new-compiler commits easier to follow.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Not really used for anything anymore. So strip it out and avoid
conflicting symbols with upcoming new-compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When we clipped a line weren't copying the provoking vertex
color to the second vertex. We also weren't checking for
first vs. last provoking vertex.
Fixes failures found with the new piglit line-flat-clip-color test.
Cc: "10.0, 10.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This has been wrong for many years. It was originally 0x000FFFFF and long
ago there was discussion about whether GL_ALL_ATTRIB_BITS should include
the then-new GL_MULTISAMPLE_BIT bit. Eventually the ARB decided that
glPushAttrib(GL_ALL_ATTRIB_BITS) should save all current and future
attribute groups (hence ~0). Unfortunately, Mesa's gl.h was never updated.
This was just recently spotted by Eric Anholt and reported as a bug to the
ARB. Ian, Jon Leech and I discussed it at the ARB meeting and decided to
change Mesa's value to reflect the ARB's decision.
Acked-by: Eric Anholt <eric@anholt.net>
If the shader is too large, plug in a dummy shader. This patch also
reworks the existing dummy shader code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gallivm soa code supported only a single level of nesting for
control flow opcodes (if, switch, loops...) but the d3d10 spec
clearly states that those are nested within functions. To support
nesting of conditionals inside functions we need to store the
nesting data inside function contexts and keep a stack of those.
Furthermore we make sure that if nesting for subroutines is deeper
than 32 then we simply ignore all subsequent 'call' invocations.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
DirectX and most hardware documentation use the term "Index Buffer" to
refer to a buffer containing indexes into arrays of vertex data, which
allows random access to vertex data, rather than sequential access.
OpenGL uses a different term for this concept: "Element Array Buffer".
However, "Index Buffer" has become much more widespread. A quick
Google search shows 29,300 hits for "Element Array Buffer" vs.
82,300 hits for "Index Buffer."
Arguably, "Index Buffer" is clearer: an "element of an array" (or list)
usually refers to an actual item stored in the array, not the index used
to refer to it.
The terminology is also already used in Mesa: some VBO module code for
dealing with ElementArrayBufferObj names local variables "ib".
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/ElementArrayBufferObj/IndexBufferObj/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For consistency with the previous renames.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_lookup_arrayobj/_mesa_lookup_vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_update_vao_client_arrays() is less of a mouthful than
_mesa_update_array_object_client_arrays(), and generally clearer.
Generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/_mesa_\([^_]*\)_array_object/_mesa_\1_vao/g'
with manual whitespace and indentation fixes applied.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I considered replacing it with "gl_vao", but spelling it out seemed to
fit better with Mesa's traditional style. Mesa doesn't shy away from
long type names - consider gl_transform_feedback_object,
gl_fragment_program_state, gl_uniform_buffer_binding, and so on.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
's/gl_array_object/gl_vertex_array_object/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the field is named "VAO" instead of "ArrayObj", it makes sense
to call the local variables "vao" instead of "arrayObj".
Completely generated by:
$ find . -type f -print0 | xargs 0 sed -i 's/arrayObj/vao/g'
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When reading through the Mesa drawing code, it's not immediately obvious
to me that "ArrayObj" (gl_array_object) is the Vertex Array Object (VAO)
state. The comment above the structure explains this, but readers still
have to remember this and translate accordingly.
Out of context, "array object" is a fairly vague. Even in context,
"array" has a lot of meanings: glDrawArrays, vertex data stored in user
arrays, gl_client_arrays, gl_vertex_attrib_arrays, and so on.
Using the term "VAO" immediately associates these fields with the OpenGL
concept, clarifying the situation and aiding programmer sanity.
Completely generated by:
$ find . -type f -print0 | xargs -0 sed -i \
-e 's/ArrayObj;/VAO;/g' \
-e 's/->ArrayObj/->VAO/g' \
-e 's/Array\.ArrayObj/Array.VAO/g' \
-e 's/Array\.DefaultArrayObj/Array.DefaultVAO/g'
v2: Rerun command to resolve conflicts with Ian's meta patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Silences many GCC warnings of the form:
drivers/common/meta.c: In function 'cleanup_temp_texture':
drivers/common/meta.c:1208:41: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_blit_framebuffer':
drivers/common/meta.c:1453:46: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_blit_cleanup':
drivers/common/meta.c:1998:43: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_clear_cleanup':
drivers/common/meta.c:2287:44: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_generate_mipmap':
drivers/common/meta.c:3365:45: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_generate_mipmap_cleanup':
drivers/common/meta.c:3556:54: warning: unused parameter 'ctx' [-Wunused-parameter]
There are a couple other similar warnings, but they are less trivial. I
want to investigate these further before axing them.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Array textures can't be used with fixed-function, so don't. Instead,
just drop the decompress request on the floor. This is no worse than
what was done previously because generating the GL error (in
_mesa_set_enable) broke everything anyway.
A later patch will get GL_TEXTURE_2D_ARRAY targets working.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no need to use pixel coordinates, and using NDC directly will
simplify the GLSL paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For these objects, meta was already using the non-Apple function to
delete the objects. Everywhere else in the file uses
_mesa_GenVertexArrays and _mesa_BindVertexArrays.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
The hardware decompression path isn't even close to being able to handle
this. This converts the crash (assertion failure) in
"EXT_texture_compression_s3tc/getteximage-targets S3TC CUBE_ARRAY" to a
plain old failure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
_mesa_meta_DrawPixels creates a VAO and (potentially) two fragment
programs, but none of them are ever released. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
decompress_texture_image creates an FBO, an RBO, a VBO, a VAO, and a
sampler object, but none of them are ever released. Later patches will
add program objects, exacerbating the problem. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
TEXTURE_BUFFER_INDEX has to be specially called out because it is not
allowed in any of the glTexParameter or glGetTexParameter functions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The next patch will use this function in another file.
v2: Rename _mesa_target_enum_to_index to _mesa_tex_target_to_index.
Suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glTexSubImage(), glCopyTexSubImage() and glCompressedTexSubImage()
only change the texel data, not other state like texture size or format.
If a driver really needs do something special it can hook into the
corresponding driver functions or Map/UnmapTextureImage().
This should avoid some needless state validation effort.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The _mesa_get_current_tex_object() function is now used everywhere that
_mesa_select_tex_object() was formerly used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As far as I know, this should be safe. If not, we have to decide whether
to have variable lookup of the functions, or just drop support for .so.0
(which is a year and a half old it looks like)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74127
Reviewed-by: Matt Turner <mattst88@gmail.com>
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there
is potentially pending rendering. Track this better, and add fd_wfi()
calls everywhere that might potentially need CP_WAIT_FOR_IDLE.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium can leave const buffers bound above what is used by the current
shader. Which can have a couple bad effects:
1) write beyond const space assigned, which can trigger HLSQ lockup
2) double emit of immed consts, first with bound const buffer vals
followed by with actual immed vals. This seems to be a sort of
undefined condition.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Currently lowers the following instructions:
DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4,
DP3, DPH, DP2
translating these into equivalent simpler TGSI instructions.
This probably should be moved to util so other drivers can use
it, but just adding under freedreno for now so that I can clear
out a lot of the lowering code in a3xx compiler before beginning
to add new compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The ctx should hold ref to dev to avoid problems if screen is destroyed
before ctx. Doesn't really fix the egl/glx issues, but at least it
prevents things from getting much worse.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In the final revision of my gen8_generator patch, I updated the MATH
instruction's assertion from (dst.hstride == 1) to check that source and
destination hstride matched. Unfortunately, I didn't test this enough,
and many Piglit tests fail this test.
The documentation indicates that "scalar source is also supported",
which we believe means <0,1,0> access mode (hstride == 0). If hstride
is non-zero, then it must match the destination register.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This puts the PCI IDs in place so it's easy to enable support. However,
it doesn't actually enable support since it's very preliminary still,
and a few crucial pieces (such as BLORP) are still missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Eric believes this to be wrong and unnecessary, as the command is
supposed to emit an implicit rectangle primitive. However, empirically
the pixel pipeline is completely unreliable without it. So for now, it
stays until someone comes up with a better solution.
We'll need to do better than this when we implement multisampling, HiZ,
or fast clears...but for now, this will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is quite similar to the Gen7 code. The main changes:
- 48-bit relocations
- Thread count is specified as U/2-1 instead of U-1.
- An extra DWord (DW9) with clip planes, URB entry output length/offsets
- We need to program the "Expected Vertex Count" (VerticesIn)
v2: Set the number of binding table entries so they can be prefetched
(requested by Eric Anholt).
v3: Add a WARN_ONCE for a missing workaround.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On previous platforms, 3DSTATE_MULTISAMPLE contained the number of
samples, pixel location, and the positions of each sample within a pixel
for each multisampling mode (4x and 8x). It was also a non-pipelined
command, presumably since changing the sample positions is fairly
drastic.
Broadwell improves upon this by splitting the sample positions out into
a separate non-pipelined state packet, 3DSTATE_SAMPLE_PATTERN. With
that removed, 3DSTATE_MULTISAMPLE becomes a pipelined state packet.
Broadwell also supports 2x and 16x multisampling, in addition to the 4x
and 8x supported by Gen7. This patch, however, does not implement 2x
and 16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The amount of cut and paste from Gen7 is rather ugly, and should
probably be cleaned up in the future. Even the Gen7 code is in need of
some tidying though; many of the function parameters aren't used on
platforms that use level/layer rather than tile offsets. Tidying both
can be left to a future patch series. This at least gets things going.
v2: Rebase on Paul's rename of NumLayers -> MaxNumLayers.
v3: Shift QPitch by 2 when storing it in the packet. Bits 14:0 store
bits 16:2 of the actual value. Fixes tests.
v4: Add missing stencil buffer QPitch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
v2: Allow logic ops on all surface types. The UNORM restriction was
lifted with Haswell and I simply hadn't noticed. Also, add missing
BRW_NEW_STATE_BASE_ADDRESS dirty bit. Both caught by Eric Anholt.
v3: Fix swapped per-RT DWord pairs. Eliminates bizarre hacks.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It has additional fields to support clipping to the viewport even if
guardband clipping is enabled.
v2: Update for viewport array changes.
v3: No, seriously, update for viewport array changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
v2: Add missing SCS setting in gen8_emit_buffer_surface_state (caught by
Eric Anholt).
v3: Use stored QPitch rather than recomputing it.
v4: Shift QPitch by 2 when setting it in the packet; bits 14:0 store
bits 16:2 of the actual value (fixes myriads of cube and array
texturing tests). Also, only enable cube face bits for cubemaps
(matches Chris Forbes' commit on master). Port to use offset64.
v5: s/gl_format/mesa_format/g
v6: Fix DW5 of renderbuffer state, which neglected to subtract
irb->mt->first_level. Use vertical_alignment() rather than
hardcoding 4. Use ffs for multisample counts rather than a
large switch statement (all caught/suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike on Gen7, we can directly set the offset via the state packet.
We also -have- to: the kernel SOL reset code won't work anymore.
v2: Fix copy and paste mistake in buffer stride setup; drop stale
comment (caught by Eric Anholt). Add a perf_debug for missing
MOCS setup.
v3: Rebase on Paul Berry's changes to CurrentVertexProgram.
v4: Fix SO Write Offset handling. We need to set bits 20 and 21 so the
hardware both loads and saves the offset. There's also a
restriction that 3DSTATE_SO_BUFFER can only be programmed once per
buffer between primitives, so the "reset to zero" code needed
reworking. Fixes most of the transform feedback Piglit tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Also disable 3DSTATE_WM_CHROMAKEY for safety.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Broadwell's winding order, polygon fill, and viewport Z test fields have
moved to DWord 1 of 3DSTATE_RASTER.
v2: Add a perf_debug for a future optimization and improve commit
message (both suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Emit a dummy 3DSTATE_VF_SGVS packet when not needed.
v3: Add WARN_ONCE and perf_debugs requested by Eric Anholt.
v4: Program 3DSTATE_SGVS even in the no-elements case so gl_VertexID
continues working. Fix 3DSTATE_VF_INSTANCING to not use an
element index to access the buffers array. Some ARB_draw_indirect
prep work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix missing "change" bit on instruction state base address
(caught by Haihao Xiang).
v3: Add a perf_debug for missing MOCS setup, requested by Eric.
v4: Fix buffer sizes. The value, specified at bit 12 and up, is
actually measured in 4k pages. We need to round up to the
next multiple of 4k.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v4]
v2: Fix setting of GEN8_PSX_ATTRIBUTE_ENABLE after rebases.
v3: Add missing binding table entry counts. Don't worry about alpha
testing or alpha to coverage when setting the "Kill Pixel" bit;
those are specified in 3DSTATE_PS_BLEND (caught by Eric Anholt).
Drop unused _NEW_BUFFERS. Tidy comments.
v4: Rebase on Paul Berry's changes to CurrentFragmentProgram.
v5: Re-enable line stippling. It doesn't crash or anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
v2: Remove incorrect MOCS shifts; rename urb_entry_write_offset to
urb_entry_output_offset to closer match the documentation.
v3: Only emit a non-zero constant buffer read length when active.
v4: Add missing binding table counts (caught by Eric).
v5: Rebase on Paul Berry's changes to CurrentVertexProgram.
v6: Drop bogus SBE read length/offset field code. We were programming
the wrong values, and our 3DSTATE_SBE code overrides any value we
put here anyway with the correct one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v4]
v2: Only set GEN8_PS_BLEND_HAS_WRITEABLE_RT if color buffer writes are
enabled (caught by Eric Anholt).
v3: Set non-blending flags (writeable RT, alpha test, alpha to coverage)
for integer formats too. +14 Piglits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Use stencil->_WriteEnabled instead of setting
GEN8_WM_DS_STENCIL_BUFFER_WRITE_ENABLE twice (suggested by Eric).
v3: Mask stencil->WriteMask and stencil->ValueMask with 0xff. The field
is only 8-bits, so we'd trip the new SET_FIELD assertion when core
Mesa gave us a value like 0xFFFFFFFF. The Gen7 code uses structure
field widths to implicitly do this truncation. Fixes Piglit tests.
v4: Use uint32_t for dw1/dw2, not uint8_t. Worst. Typo. Ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
The attribute override portion of 3DSTATE_SBE was split out into
3DSTATE_SBE_SWIZ; various bits of 3DSTATE_SF were split out into
3DSTATE_RASTER.
v2: Set Force URB Read Offset bit. Eventually the URB read offset
should be set in 3DSTATE_VS, but that will require some refactoring.
v3: Rebase on viewport array changes.
v4: Improve comments about URB read length/offset overrides.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I haven't investigated whether these are necessary on Broadwell or not,
but for paranoia's sake, we may as well continue doing them for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It's going to diverge significantly. Starting out with a copy allows
future patches to change atoms one by one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code re-enabling denorms for small float formats did not recognize
this format due to format handling hacks (mainly, the lp_type doesn't have
the floating bit set).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The -p option we now use when calling bison means that this variable will be
named glcpp_parser_debug not yydebug. This was not caught when the -p option
was added because this variable isn't used in the code as committed. (I prefer
the declaration to remain since it allows a developer to easily find this
variable name to enable debugging.)
This is the innocent-looking but killer test case to verify the bug fixed in
the preceding commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 6005e9cb28 a new start state of NEWLINE_CATCHUP was added to the
lexer. This start state is used whenever the lexer is emitting a NEWLINE token
to emit additional NEWLINE tokens for any newline characters that were skipped
by an immediately preceding multi-line comment.
However, that commit erroneously entered the NEWLINE_CATCHUP state for
single-line comments. This is not desired since in the case of a single-line
comment, the lexer is not emitting any NEWLINE token. The result is that the
lexer will remain in the NEWLINE_CATCHUP state and proceed to fail to emit a
NEWLINE token for the subsequent newline character, (since the case to match \n expects only the INITIAL start state).
The fix is quite simple, remove the "BEGIN NEWLINE_CATCHUP" code from the
single-line comment case, (preserving it only in exactly the cases where the
lexer is actually emitting a NEWLINE token).
Many thanks to Petri Latvala for reporting this bug and for providing the
minimal test case to exercise it. The bug showed up only with a multi-line
comment which was followed immediately by a single-line comment (without any
intervening newline), such as:
/*
*/ // Kablam!
Since 6005e9cb28, and before this commit, that very innocent-looking
combination of comments would yield a parse failure in the compiler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This automatically adjusts the number of buffers that we want based on
what swapping mode the X server is using and the current swap interval:
swap mode interval buffers
copy > 0 1
copy 0 2
flip > 0 2
flip 0 3
Note that flip with swap interval 0 is currently limited to twice the
underlying refresh rate because of how the kernel manages flipping. Moving
from 3 to 4 buffers would help, but that seems ridiculous.
v2: Just update num_back at the point that the values that change num_back
change. This means we'll have the updated value at the point that the
freeing of old going-to-be-unused backbuffers happens, which might not
have been the case before (change by anholt, acked by keithp).
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The __DRIimage createImageFromFds function takes a fourcc code, but there was
no fourcc code that match __DRI_IMAGE_FORMAT_SARGB8. This adds a define for
that format, adds a translation in DRI3 from __DRI_IMAGE_FORMAT_SARGB8 to
__DRI_IMAGE_FOURCC_SARGB8888 and then adds translations *back* to
__IMAGE_FORMAT_SARGB8 in both the i915 and i965 drivers.
I'll refrain from comments on whether I think having two separate sets of
format defines in dri_interface.h is a good idea or not...
Fixes piglit glx-tfp and glx-visuals-depth
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
XCB doesn't flush the output buffer automatically, so we have to call
xcb_flush ourselves before waiting.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we're tracking SBC values correctly, and the X server has the
ability to send the GLX swap events from a PresentPixmap request, enable
this extension.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric figured out that glXWaitForSbcOML wanted to block until the requested
SBC had been completed, which means to wait until the
PresentCompleteNotify event for that SBC had been received.
This replaces the simple sleep(1) loop (which was bogus) with a loop that
just checks to see if we've seen the specified SBC value come back in a
PresentCompleteNotify event yet.
The change is a bit larger than that as I've broken out a piece of common
code to wait for and process a single Present event for the target
drawable.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tracking the full 64-bit SBC values makes it clearer how those values are
being used, and simplifies the wait_msc code. The only trick is in
re-constructing the full 64-bit value from Present's 32-bit serial number
that we use to pass the SBC value from request to event.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates how we link the libraries into the build directory.
It works for lib_LTLIBRARIES but not custom shared libraries like DRI
drivers or gallium state trackers which needs special casing (cf dri
mega drivers, for example)
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Spamming the pci id is not beneficial. Make sure it's printed
only when needed.
v2: Change severity to _LOADER_DEBUG, rather than removing
the message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The former symbol is never defined within mesa. Based on the code
it seems that the original intent was to use NDEBUG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Makefiles need hard tabs, let's not make that harder than it needs to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
v2: Making GL_ARB_map_buffer_alignment a desktop OpenGL extension only.
v3: Squash two commits together.
v4 (idr): MIN_MAP_BUFFER_ALIGNMENT queries don't have any dependencies.
In previous versions of the patch it depended on EXTRA_API_GL which
would prevent the query from working in core profile contexts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This driver does not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These drivers do not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ian manually ran the map_buffer_range* tests and the
arb_map_buffer_alignment-* tests, but he did not do a full piglit run.
v2 (idr): Use 64 instead of 4096
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Move parameter loads out of loops, and use the instruction offset
instead of a VGPR for the vertex attribute offset when writing to the
ESGS ring buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just always bind the current states before drawing.
Besides the simplification, as a bonus this makes sure the VS hardware
shader stage always uses the GS copy shader when a geometry shader is
active, fixing a number of GS related piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It needs to increment at shader runtime, not at shader compile time, as
the geometry shader can emit vertices in loops. LLVM automagically
converts the ID back to an immediate value if its value can be
determined at compile time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Transforms, for example,
mul vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3
into
mul.sat vgrf3, vgrf2, vgrf1
mov vgrf4, vgrf3
which gives register_coalescing an opportunity to remove the MOV
instruction.
total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs: 798586 -> 788181 (-1.30%)
GAINED: 0
LOST: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
OpenGL 3.3 spec expects GL_INVALID_OPERATION:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT AND BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID OPERATION."
But OpenGL 4.0 spec changed the error code to GL_INVALID_ENUM:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID_ENUM."
This patch changes the behaviour to match OpenGL 4.0 spec
Fixes Khronos OpenGL CTS draw_buffers_api.test.
V2: Update the comment in code.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I sometimes build without EGL just for speed purposes, however
it no longer finds my drivers when I do due to the HAVE_LIBUDEV
defines being wrong.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the tests for the depth parameter for TexImage3D calls when the
target type is GL_TEXTURE_2D_ARRAY or GL_TEXTURE_CUBE_MAP_ARRAY
so that a depth value of 0 is accepted. Previously, the check
incorrectly required the depth argument to be atleast 1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code depending on the definitions is already wrapped
in the same conditional so go ahead and wrap the include.
Otherwise we'll brake compilation on platforms that are
missing the header. Add assert.h in there as well, as it
is introduced and used in the same fashon.
Cc: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74122
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We have code generation paths that carry out swizzles of AoS vectors via
bitwise shifts, as these tend to generate more efficient code than
straightforward byte shuffles. But when the input is a constant the
additional bitwise arithmetic operations somehow don't really get
constant propagated properly, evenutally causing assertion failure in
InstCombine pass.
Therefore avoid the bug by using the trivial shuffles for constant
inputs.
Although the sample LLVM IR can cause a crash with any LLVM version,
this was only seen in practice with LLVM 3.2.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The check was in the wrong place, such that if a shader incorrectly put
a preprocessor token before the #version declaration, the version would
be resolved twice, leading to a segmentation fault when attempting to
redefine the __VERSION__ macro.
#extension GL_ARB_sample_shading: require
#version 130
void main() {}
Also, rename glcpp_parser_resolve_version to
glcpp_parser_resolve_implicit_version to avoid confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
radeonsi now reports PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE = true if UVD support
isn't available. It's what all the other drivers do.
Also, some #include directives were missing in radeon_uvd.h.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Implemented by the common code. You can now visualize the statistics
with the HUD, see GALLIUM_HUD=help for all available queries. For example:
GALLIUM_HUD=clipper-primitives-generated
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Replace Type A _INT formats names with _SINT to match naming spec,
and update type C formats as follows:
s/MESA_FORMAT_R_INT8\b/MESA_FORMAT_R_SINT8/g
s/MESA_FORMAT_R_INT16\b/MESA_FORMAT_R_SINT16/g
s/MESA_FORMAT_R_INT32\b/MESA_FORMAT_R_SINT32/g
s/MESA_FORMAT_RG_INT8\b/MESA_FORMAT_RG_SINT8/g
s/MESA_FORMAT_RG_INT16\b/MESA_FORMAT_RG_SINT16/g
s/MESA_FORMAT_RG_INT32\b/MESA_FORMAT_RG_SINT32/g
s/MESA_FORMAT_RGB_INT8\b/MESA_FORMAT_RGB_SINT8/g
s/MESA_FORMAT_RGB_INT16\b/MESA_FORMAT_RGB_SINT16/g
s/MESA_FORMAT_RGB_INT32\b/MESA_FORMAT_RGB_SINT32/g
s/MESA_FORMAT_RGBA_INT8\b/MESA_FORMAT_RGBA_SINT8/g
s/MESA_FORMAT_RGBA_INT16\b/MESA_FORMAT_RGBA_SINT16/g
s/MESA_FORMAT_RGBA_INT32\b/MESA_FORMAT_RGBA_SINT32/g
s/\bMESA_FORMAT_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_SNORM/g
s/\bMESA_FORMAT_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_SNORM/g
s/\bMESA_FORMAT_L_LATC1\b/MESA_FORMAT_L_LATC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_L_LATC1\b/MESA_FORMAT_L_LATC1_SNORM/g
s/\bMESA_FORMAT_LA_LATC2\b/MESA_FORMAT_LA_LATC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_LA_LATC2\b/MESA_FORMAT_LA_LATC2_SNORM/g
Update comments. Replace format names containing SIGNED with
SNORM appended w/decoration per the format name spec:
s/MESA_FORMAT_SIGNED_R8\b/MESA_FORMAT_R_SNORM8/g
s/MESA_FORMAT_SIGNED_RG88_REV\b/MESA_FORMAT_R8G8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBX8888\b/MESA_FORMAT_X8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888\b/MESA_FORMAT_A8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_SNORM/g
s/MESA_FORMAT_SIGNED_R16\b/MESA_FORMAT_R_SNORM16/g
s/MESA_FORMAT_SIGNED_GR1616\b/MESA_FORMAT_R16G16_SNORM/g
s/MESA_FORMAT_SIGNED_RGB_16\b/MESA_FORMAT_RGB_SNORM16/g
s/MESA_FORMAT_SIGNED_RGBA_16\b/MESA_FORMAT_RGBA_SNORM16/g
s/MESA_FORMAT_SIGNED_A8\b/MESA_FORMAT_A_SNORM8/g
s/MESA_FORMAT_SIGNED_I8\b/MESA_FORMAT_I_SNORM8/g
s/MESA_FORMAT_SIGNED_L8\b/MESA_FORMAT_L_SNORM8/g
s/MESA_FORMAT_SIGNED_A16\b/MESA_FORMAT_A_SNORM16/g
s/MESA_FORMAT_SIGNED_I16\b/MESA_FORMAT_I_SNORM16/g
s/MESA_FORMAT_SIGNED_L16\b/MESA_FORMAT_L_SNORM16/g
s/MESA_FORMAT_SIGNED_AL88\b/MESA_FORMAT_L8A8_SNORM/g
s/MESA_FORMAT_SIGNED_RG88\b/MESA_FORMAT_G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RG1616\b/MESA_FORMAT_G16R16_SNORM/g
Change all 4 color component unsigned byte formats to meet spec for P
Type formats:
s/MESA_FORMAT_RGBA8888\b/MESA_FORMAT_A8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_UNORM/g
s/MESA_FORMAT_ARGB8888\b/MESA_FORMAT_B8G8R8A8_UNORM/g
s/MESA_FORMAT_ARGB8888_REV\b/MESA_FORMAT_A8R8G8B8_UNORM/g
s/MESA_FORMAT_RGBX8888\b/MESA_FORMAT_X8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBX8888_REV\b/MESA_FORMAT_R8G8B8X8_UNORM/g
s/MESA_FORMAT_XRGB8888\b/MESA_FORMAT_B8G8R8X8_UNORM/g
s/MESA_FORMAT_XRGB8888_REV\b/MESA_FORMAT_X8R8G8B8_UNORM/g
v2: Note that Fredrik Höglund is working on GL_ARB_multi_bind, not
Maxence Le Doré. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The define was only available if
gl_extensions::AMD_shader_trinary_minmax was set, but no driver set it.
Since the extension is advertised by default, remove that field too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Maxence Le Doré <maxence.ledore@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a Radeon 7500 (OpenGL renderer string: Mesa DRI R100 (RV200
5157) TCL DRI2). I couldn't do a full piglit run because it would tank
the system with or without this patch. I just ran all the blit tests
(-t blit to piglit-run.py). Only fbo-sys-sub-blit failed. All of the
other tests that weren't skipped (i.e., all the multisample and sRGB
tests skip) passed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a FireGL 8800 (OpenGL renderer string: Mesa DRI R200 (R200
5148) TCL DRI).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the glTexStorage3D failure in
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need almost identical code in the glTexStorage path.
v2: Fix typo in a comment noticed by Topi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All versions of the OpenGL spec are quite clear that
GL_INVALID_OPERATION should be generated. I added a quotation from the
3.3 core profile spec.
Fixes the glTexImage3D subcases of
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2. The same subtests
of oes_packed_depth_stencil-depth-stencil-texture_gles1 fail, but they
fail with a different wrong error code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cover both loader and glx/dri_glx
Drop \n from the default loader logger
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the loader changes, there has been a compiler warning that the
prototype didn't match. It turns out that if a loader error message was
ever thrown, you'd segfault because of trying to use the warning level as
a format string.
Reviewed-by: Keith Packard <keithp@keithp.com>
Tested-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This allows Mesa to choose to rename driver .sos (or split drivers),
without needing a flag day with the corresponding 2D driver.
v2: Undo the loader-only-for-dri3 change.
Reviewed-by: Keith Packard <keithp@keithp.com> [v1]
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
I want to stop trusting the server for the driver name, and instead decide
on our own based on the fd, so I needed this code motion.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Steam links against libudev.so.0, while we're linking against
libudev.so.1. The result is that the symbol names (which are the same in
the two libraries) end up conflicting, and some of the usage of .so.1
calls the .so.0 bits, which have different internal structures, and
segfaults happen.
By using a dlopen() with RTLD_LOCAL, we can explicitly look for the
symbols we want, while they get the symbols they want.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Some of the hardware support is missing. The NVIDIA-provided driver,
which claims seamless cube map support fails the relevant tests as well.
As this is the last extension before we can have OpenGL 3.2, doing this
allows us to expose geometry shaders without doing the additional
work involved in supporting ARB_geometry_shader4.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Creates two areas in the AUX constbuf:
- Sample offsets for MS textures
- Per-texture MS settings
When executing a texelFetch with a MS sampler, looks up that texture's
settings and adjusts the parameters given to the texfetch instruction.
With this change, all the ARB_texture_multisample piglits pass, so turn
on PIPE_CAP_TEXTURE_MULTISAMPLE.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each code BO is a heap that allocates at the end first, and so GPs are
allocated at the very end of the allocated space. When executing, we see
PAGE_NOT_PRESENT errors for the next page. Just over-allocate to make
sure that there's something there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Make sure that we never try to use a 0-sized map. This can happen when
using a gp, so add a dummy mapping when computing vp_gp_mapping in that
case.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marks gl_Layer as only having one component, and makes sure to keep
track of where it is and emit it in the output map, since it is not an
input to the FP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Note that the primitive id is stored in a[0x18], while usually the
geometry instructions are of the form a[$a1 + 0x4] which gets mapped to
p[] space. We need to avoid the change from a[] to p[] here, so it's
keyed on whether the access is indirect or not.
Note that there's also a use-case for accessing e.g. a[$r1], however
that's not supported for now. (Could be added by checking the register
file of the indirect parameter.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Layer output probably doesn't work yet, but other than that everything seems
to be working.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: fix up minor bugs, code formatting]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL
instructions, emit the shift when the loaded address is actually used. This
is necessary because input vertex and attribute indices in geometry shaders on
nv50 need to be shifted left by 2 instead of 4.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: various updates to the indirect address logic]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
[imirkin: remove OP_MAD change that calim made, add OP_RESTART handling
same as OP_EMIT for code flow analysis]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is needed since commit 9baa45f78b (st/mesa: bind NULL colorbuffers
as specified by glDrawBuffers).
This implementation is highly based on a larger commit by
Christoph Bumiller <e0425955@student.tuwien.ac.at> in his gallium-nine
branch.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It doesn't make sense to do an OP_NEG from U32 to U32. This was
manifested on nv50 in glsl-fs-atan-3 which was generating a
UMAD TEMP[0].x, TEMP[0].xxxx, -TEMP[5].xxxx, TEMP[0].xxxx
instruction. (For some reason, nvc0 causes a different shader to be
generated.) This led to a
cvt neg u32 $r1 u32 $r1
Which did not yield the desired result. This changes the final output to
cvt neg s32 $r1 u32 $r1
which produces the desired output and the piglit tests passes. My
assumption is that this is also what we want on nvc0, but could not test
as there was no suitable shader that generated the problem instruction.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When the min_index is very large (or very negative), the multipliation
can overflow 32 bits and result in an incorrect map pointer
modification.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was discovered as a result of the draw-elements-base-vertex-neg
piglit test, which passes very negative offsets in, followed up by large
indices. The nouveau code correctly adjusts the pointer, but the
translate code needs to do the proper inverse correction. Similarly fix
up the SSE code to do a 64-bit multiply to compute the proper offset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Broadwell requires software to specify QPitch in a bunch of packets,
so we decided to store it in the miptree. However, when I did that
refactoring, I missed a subtlety: the hardware expects QPitch to be
"in units of rows in the uncompressed surface".
This is the value we originally compute. However, for compressed
surfaces, we then divided it by 4 (the block height), to obtain the
physical layout. This is no longer the QPitch Broadwell expects.
So, store the original undivided value in mt->qpitch, but continue to
use the divided value in brw_miptree_layout_texture_array(). For
non-Broadwell platforms, this should have no impact at all.
Helps fix Piglit's "getteximage-targets S3TC CUBE" test on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes the NetBSD build.
NetBSD does not have pthread_mutex_timedlock.
CC glapi_dispatch.lo
threads_posix.h: In function 'mtx_timedlock':
threads_posix.h:216:5: error: implicit declaration of function 'pthread_mutex_timedlock'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The type of all three parameters are identical, so we don't need to
specify it three times. The predicate is always identical too, so we
don't need to make it a parameter, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Simple shaders such as:
void splat(vec2 v, float f) {
v[0] = v[1] = f;
}
failed to compile with the following error:
error: value of type vec2 cannot be assigned to variable of type float
First, we would process v[1] = f, and transform:
LHS: (expression float vector_extract (var_ref v) (constant int (1)))
RHS: (var_ref f)
into:
LHS: (var_ref v)
RHS: (expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f))
Note that the LHS type is now vec2, not a float. This is surprising,
but not the real problem.
After emitting assignments, this ultimately becomes:
(declare (temporary) vec2 assignment_tmp)
(assign (xy)
(var_ref assignment_tmp)
(expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f)))
(assign (xy) (var_ref v) (var_ref assignment_tmp))
We would then return (var_ref assignment_tmp) as the rvalue, which has
the wrong type---it should be float, but is instead a vec2.
To fix this, we simply return (vector_extract (var_ref assignment_temp)
<the appropriate channel>) to pull out the desired float value.
Fixes Piglit's chained-assignment-with-vector-constant-index.vert and
chained-assignment-with-vector-dynamic-index.vert tests.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74026
Reported-by: Dan Ginsburg <dang@valvesoftware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Transform feedback may come from either the geometry shader or the
vertex shader, so we can't use
ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to find the current
post-link transform feedback information. Fortunately we can use
ctx->TransformFeedback.CurrentObject->shader_program.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, the _mesa_{Begin,Resume}TransformFeedback
functions were using ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to
find the program that would be the source of transform feedback data.
This isn't correct--if there's a geometry shader present it should be
ctx->Shader.CurrentProgram[MESA_SHADER_GEOMETRY]. (These might be
different if separate shader objects are in use).
This patch creates a function get_xfb_source(), which figures out the
correct program to use based on GL state, and updates
_mesa_{Begin,Resume}TransformFeedback to call it. get_xfb_source() is
written in terms of the gl_shader_stage enum, so it should not need
modification when we add tessellation shaders in the future. It also
creates a new driver flag, NewTransformFeedbackProg, which is flagged
whenever this program changes.
To reduce future confusion, this patch also rewords some comments and
error message text to avoid referring to vertex shaders.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
v2: make the for loop in get_xfb_source() clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "shader" field in fs_generator, vec4_generator, and gen8_generator
was only used for one purpose; to figure out if we were compiling an
assembly program or a GLSL shader (shader is NULL for assembly
programs). And it wasn't being used properly: in vec4 shaders we were
always initializing it based on
prog->_LinkedShaders[MESA_SHADER_FRAGMENT], regardless of whether we
were compiling a geometry shader or a vertex shader.
This patch simplifies things by using the "prog" field instead; this
is also NULL for assembly programs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of defining preprocessor macros in glcpp_parser_create based on
the GL API, wait until the shader version has been resolved. Doing this
allows us to correctly set (and not set) preprocessor macros for
extensions allowed by the API but not the shader, as in the case of
ARB_ES3_compatibility.
The shader version has been resolved when the preprocessor encounters
the first preprocessor token, since the GLSL spec says
"The #version directive must occur in a shader before anything else,
except for comments and white space."
Specifically, if a #version token is found the version is known
explicitly, and if any other preprocessor token is found then the GLSL
version is implicitly 1.10.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71630
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes glean fragProg1 regression caused by commit b9f68d927e
(implement TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS). This bug
only appears when the fragment shader emits fragment.Z before
color outputs. The bug was caused by confusion between register
indexes and semantic indexes.
Also added some comments to better explain register indexing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise we pull libudev as a dependency and crash
games/programs that ship their own version of libudev.
Either way we should link the loader lib only when needed.
This fixes a regression caused by
commit eac776cf77
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Sat Jan 11 02:24:43 2014 +0000
glx: use the loader util lib
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73854
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Leaving it set to zero isn't really correct since every allocation has
at least an alignment of 1 byte. It also caused a problem in the i965
driver after I removed the MAX(64, ...) from the alignment calculation.
That's what I get for changing a patch without retesting it. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73907
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lu Hua <huax.lu@intel.com>
Sed job:
grep -lr BEGIN_BATCH_NO_AUTOSTATE src/mesa/drivers/dri/ | while read f
do
cat $f | sed 's/BEGIN_BATCH_NO_AUTOSTATE/BEGIN_BATCH/g' > x
mv x $f
done
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
This parameter hasn't been used since January 2010 (commit 29e02c7).
Fixes the following warning in both radeon and r200:
radeon_common.c: In function 'r200_rcommonBeginBatch':
radeon_common.c:762:14: warning: unused parameter 'dostate' [-Wunused-parameter]
Note that now BEGIN_BATCH and BEGIN_PATCH_NO_AUTOSTATE are identical.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
radeon_common.c: In function 'radeon_draw_buffer':
radeon_common.c:237:3: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
When parameters were removed from dd_function_table::Viewport (commit
065bd6ff), radeon_viewport (in both radeon and r200) started generating
a warning.
radeon_common.c: In function 'r200_radeon_viewport':
radeon_common.c:415:15: warning: assignment from incompatible pointer type [enabled by default]
radeon_common.c:419:23: warning: assignment from incompatible pointer type [enabled by default]
I didn't notice this initially, and it's harmless because the function is
never called through the incorrectly typed pointer.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Otherwise they will be NULL when stage destroy is invoked prematurely,
(i.e, on out of memory).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use some new helper functions to make the code much more readable.
And fix wrong value for XPD's w result.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In some cases we were converting generic formats to sized formats
and vice versa. The point is to simply convert sRGB formats to
corresponding linear formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Became apparent with the C11 thread changes. Unfortunately I didn't
have all dependencies to build the driver, and only noticed
this issue on build server.
Implementation is based of https://gist.github.com/2223710 with the
following modifications:
- inline implementatation
- retain XP compatability
- add temporary hack for static mutex initializers (as they are not part
of the stack but still widely used internally)
- make TIME_UTC a conditional macro (some system headers already define
it, so this prevents conflict)
- respect HAVE_PTHREAD macro
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Previously the reason we needed is_array was because we used array_size == NULL to
represent both non-arrays and unsized arrays. Now that we use a non-NULL
array_specifier to represent an unsized array, is_array is redundant.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need to insert outermost dimensions in the correct spot otherwise
the dimension order will be backwards
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
array
This change does not help fix or prevent any bugs
it just seems reasonable to do
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No regressions on IVB (piglit quick + unit tests).
v2 (Paul):
- no need to patch the unit tests anymore. Original logic
was altered and unit tests updated to match the
fs-generator
- lrp emission moves from the blorp compiler core into the
emitter here (previously there was a separate refactoring
patch which is not really needed anymore as the lrp logic
got refactored when the original lrp logic got fixed).
- pass 'BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX' to the
generator in fs_inst::target instead of hardcoding it
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The compiler for blorp programs likes to emit instructions for
the message construction itself meaning that the generator needs
to skip any such when blorp programs are translated for the hw.
In addition, the binding table control is special for blorp
programs and the generator does not need to update the binding
tables associated with the compiler bookkeeping (this in fact
gets thrown away as the blorp compiler sets the program data
in its own way).
v2 (Paul): do not hardcode the binding table index but use
fs_inst::target instead.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Unit tests comparing generated blorp programs to known good need
to have the dump in designated file instead of in default
standard output. The comparison also expects the jump counters
of if-else-instructions to be correctly set and hence the dump
needs to be taken _after_ 'patch_IF_ELSE()' is run (the default
dump of the fs_generator does this before).
v2 (Paul): dropped the redundant 'dump_enabled' argument
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the special case requiring explicit execution size
control is wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the two special cases requiring explicit execution
size control are wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2 (Paul): pass the combining opcode as an argument to emit_combine().
This keeps manual_blend_average() selfcontained
documentation wise.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Resolving of the hardware message type is moved into the
emitter also in preparation for switching to use fs_generator.
The generator wants to translate the high level op-code into
the message type and hence the emitter needs to know the
original op-code.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for the introduction of non-compressed multi-sampled
lookup used in the blorp programs.
v2: now also taking into account gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The combination of four separate comparison operations and
and the masked "and" require special treatment when moving
to FS LIR.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for presenting blorp blit programs using FS IR that
allows EU-assembly generation using i965 glsl-compiler
backend (fs_generator).
v2: rebased on top of endif-jump counter fix (moving the
added brw_set_uip_jip() into the emitter)
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The Intel closed source OpenGL driver recently began supporting 32
texture image units on Haswell. This makes the open source driver
support 32 as well.
Earlier generations don't have the message header field required to
support more than 16 sampler states, so we continue to advertise 16
there.
On Haswell, this causes us to advertise:
- GL_MAX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
instead of the old values of 16, 16, and 48.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
BRW_MAX_TEX_UNIT is about to grow, but only Gen7+ will be able to
support the new larger value. On older platforms, we don't want to
allocate the extra space - it would just be a waste.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Like the scalar backend, we add an offset to the "Sampler State Pointer"
field to select a group of 16 samplers, then use the "Sampler Index"
field to select within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The next patch adds an additional case where the message header is
necessary. So we want to do the g0 copy if inst->header_present is set,
rather than inst->texture_offset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In theory, a shader might use textureOffset() but set all the texel
offsets to zero. In that case, we don't actually need to set up the
message header - zero is the implicit default.
By moving the texture_offset setup before the header_present setup, we
can easily only set header_present when there are non-zero texel offset
values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The message descriptor's "Sampler Index" field is only 4 bits (on all
generations of hardware), so it can only represent indices 0 through 15.
Haswell introduced a new field in the message header - "Sampler State
Pointer". Normally, this is copied straight from g0, but we can also
add a byte offset (as long as it's a multiple of 32).
This patch uses a "Sampler State Pointer" offset to select a group of
16 sampler states, and then uses the "Sampler Index" field to select
the state within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the code to copy g0 to the message header existed in two
places - one for the texture offset case, and one for any other case.
By treating texture_offset as a special case of header_present, we can
remove this duplication and shorten the code. Future patches which add
new header fields also won't have to add additional duplication.
This also clarifies a confusing construct. The old code contained:
} else if (inst->header_present) {
if (brw->gen >= 7) {
...explicit copy from g0 to the message header...
} else {
/* Set up an implied move from g0 to the MRF. */
}
}
This looks like it might set up an implied move on Sandybridge, which
doesn't support those. However, Sandybridge only uses a message header
for texture offsets, so it would never hit this code path. The new code
avoids this implicit knowledge by only setting up an implied move on
Gen4-5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Use the WINDOW and VPORT scissors for the framebuffer and scissor test,
respectively. The other two scissors are disabled (they cover the max fb size).
We actually have 16 VPORT scissors, which will map well to ARB_viewport_array.
Also, we don't need to write SC_WINDOW_OFFSET with this commit, because it's
disabled everywhere.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Also set the unsynchronized flag if the whole resource was discarded
to avoid doing buffer-busy checks again.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's unused and shouldn't be used at all in my opinion.
If some driver doesn't support the unsynchronized flag, u_upload_mgr should
avoid the synchronization by other means, e.g. by using the DONTBLOCK flag.
This is the recommended way for streaming vertices. Always use this if you
need to upload vertices every frame.
Reviewed-by: Christian König <christian.koenig@amd.com>
Commit 05da4a7a5e attempts to eliminate the
call to intel_update_renderbuffer() in the case where we already have a
drawbuffer for the drawable. Unfortunately this only checks the
back left renderbuffer, which breaks in case of single buffer drawables.
This means that the initial viewport will not be set in that case. Instead,
we now check whether the initial viewport has not been set, in which case
we call out to intel_update_renderbuffer().
https://bugs.freedesktop.org/show_bug.cgi?id=73862
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Most of the time it is not necessary to perform type inference to
compile GLSL; the type of every expression can be inferred from the
contents of the expression itself (and previous type declarations).
The exception is aggregate initializers: their type is determined by
the LHS of the variable being assigned to. For example, in the
statement:
mat2 foo = { { 1, 2 }, { 3, 4 } };
the type of { 1, 2 } is only known to be vec2 (as opposed to, say,
ivec2, uvec2, int[2], or a struct) because of the fact that the result
is being assigned to a mat2.
Previous to this patch, we handled this situation by doing some type
inference during parsing: when parsing a declaration like the one
above, we would call _mesa_set_aggregate_type(), which would infer the
type of each aggregate initializer and store it in the corresponding
ast_aggregate_initializer::constructor_type field. Since this
happened at parse time, we couldn't do the type inference using
glsl_type objects; we had to use ast_type_specifiers, which are much
more awkward to work with. Things are about to get more complicated
when we add support for ARB_arrays_of_arrays.
This patch simplifies things by postponing the call to
_mesa_set_aggregate_type() until ast-to-hir time, when we have access
to glsl_type objects. As a side benefit, we only need to have one
call to _mesa_set_aggregate_type() now, instead of six.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specs say "If the argument is a buffer object, the arg_value
pointer can be NULL or point to a NULL value in which case a NULL
value will be used as the value for the argument declared as a
pointer to __global or __constant memory in the kernel."
So don't crash when somebody does that.
v2: Insert NULL into input buffer instead of buffer handle pair
Fix constant_argument too
Drop r600 driver changes
v3: Fix inserting NULL pointer
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
No known bugs fixed but this is now in line with fs-generator.
No regresssions on IVB.
Eric further explained that:
"The endif jump, since it's forward, is just an optimization to
have set right -- otherwise, the GPU will just step forward
instruction by instruction until it hits something else that
updates the per-channel PC."
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we have a ctx->Shader.CurrentProgram array, we can just use
it directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since ctx->Shader.Current{Vertex,Geometry,Fragment}Program is an
array, this allows some meta code to be rolled up into loops.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are replaced with
ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}].
In patches to follow, this will allow us to replace a lot of ad-hoc
logic with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \
-print0 | xargs -0 sed -i \
-e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \
-e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \
-e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g'
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than maintain separately named arrays and counts for vertex,
geometry, and fragment shaders, just maintain these as arrays indexed
by the gl_shader_type enum.
v2: When there is neither a vertex nor a geometry shader, set
prog->LastClipDistanceArraySize = 0, and clarify that the values is
not used.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces code in _mesa_new_shader() and delete_shader_cb()
that checks the type of a shader with calls to
_mesa_validate_shader_target(). This has two advantages: it allows
for a more thorough check (since _mesa_validate_shader_target()
doesn't permit shader targets that aren't supported by the back-end),
and it reduces the amount of code that will need to be modified when
adding new shader stages.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow this function to be used in circumstances where there
is no context available, such as when building built-in GLSL
functions.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
In my recent zeal to refactor Mesa's handling of the gl_shader_stage
enum, I accidentally wound up with two functions that do the same
thing: _mesa_program_index_to_target(), and
_mesa_shader_stage_to_program().
This patch keeps _mesa_shader_stage_to_program(), since its name is
more consistent with other related functions. However, it changes the
signature so that it accepts an unsigned integer instead of a
gl_shader_stage--this avoids awkward casts when the function is called
from C++ code.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make the functions available to work around a GLEW bug (see
comments already in the code), but if an application calls one of these
functions we should still generate GL_INVALID_OPERATION.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch handles the use of 'centroid' qualifier with 'in' variables
in a fragment shader when persample shading is enabled. Per sample
shading for the whole fragment shader can be enabled by:
glEnable(GL_SAMPLE_SHADING) or using {gl_SamplePosition, gl_SampleID}
builtin variables in fragment shader. Explaining it below in more
detail.
/* Enable sample shading using OpenGL API */
glEnable(GL_SAMPLE_SHADING);
glMinSampleShading(1.0);
Example fragment shader:
in vec4 a;
centroid in vec4 b;
main()
{
...
}
Variable 'a' will be interpolated at sample location. But, what
interpolation should we use for variable 'b' ?
ARB_sample_shading recommends interpolation at sample position for
all the variables. GLSL 400 (and earlier) spec says that:
"When an interpolation qualifier is used, it overrides settings
established through the OpenGL API."
But, this text got deleted in later versions of GLSL.
NVIDIA's and AMD's proprietary linux drivers (at OpenGL 4.3)
interpolates at sample position. This convinces me to use
the similar approach on intel hardware.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Current implementation of arb_sample_shading doesn't set 'Barycentric
Interpolation Mode' correctly. We use pixel barycentric coordinates
for per sample shading. Instead we should select perspective sample
or non-perspective sample barycentric coordinates.
It also enables using sample barycentric coordinates in case of a
fragment shader variable declared with 'sample' qualifier.
e.g. sample in vec4 pos;
A piglit test to verify the implementation has been posted on piglit
mailing list for review.
V2: Do not interpolate all the 'in' variables at sample position
if fragment shader uses 'sample' qualifier with one of them.
For example we have a fragment shader:
#version 330
#extension ARB_gpu_shader5: require
sample in vec4 a;
in vec4 b;
main()
{
...
}
Only 'a' should be sampled at sample location, not 'b'.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be useful in my next patch which depends on a functionality
of _mesa_get_min_invocations_per_fragment() to ignore the sample
qualifier (prog->IsSample) based on a flag passed to it.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by
4.61%, and CS:GO by 5.71%.
total instructions in shared programs: 1500153 -> 1498191 (-0.13%)
instructions in affected programs: 59919 -> 57957 (-3.27%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented for ir_swizzles currently, but perhaps will be useful
for other IR types in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This flag was really just a proxy for determining whether the backend
was vector (AOS) or scalar (SOA). It will be used to apply a future
optimization only for vector backends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Dumping the number of live registers at each IP allows us to see
register pressure and identify any local maxima. This should
aid in debugging passes designed to reduce register pressure, as
well as optimizations that suddenly trigger spilling.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Calling it after value numbering (added in the next commit) prevents
some instruction count regressions.
total instructions in shared programs: 1524387 -> 1523905 (-0.03%)
instructions in affected programs: 13112 -> 12630 (-3.68%)
GAINED: 0
LOST: 3
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we simply considered two registers whose live ranges
overlapped to interfere. Cases such as
set A ------
... |
mov B, A -- |
... | B | A
use B -- |
... |
use A ------
would be considered to interfere, even though B is an unmodified copy of
A whose live range fit wholly inside that of A.
If no writes to A or B occur between the mov B, A and the use of B then
we can safely coalesce them.
Instead of removing MOV instructions, we make them NOPs and remove them
at once after the main pass is finished in order to avoid recomputing
live intervals (which are needed to perform the previous step).
total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
instructions in affected programs: 951563 -> 920872 (-3.23%)
GAINED: 46
LOST: 22
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
mov takes only a single source argument. Example instruction
inexplicably changed from add to mov in commit f10f5e49.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Unnamed record types are assigned to separate types per stage, e.g. if
uniform struct { ... } a;
is defined in both vertex and fragment shader, two separate types will
result with different names. When linking the shader, this results in a
type conflict. However, there is no reason why this should not be
allowed according to GLSL specifications. Compare and match record types
when linking shader stages to avoid this conflict.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes several colorbuffer tests, including piglit "fbo-drawbuffers-none"
for "gl_FragColor" and "glDrawPixels" cases.
v2: rework patch to only avoid creating extra shader variants when
TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS is not specified. Per Jose.
Use a write_color0_to_n_cbufs key field to replicate color0 to N
color buffers only when N > 0 and WRITES_ALL_CBUFS is set.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The new TYPE_DOUBLEN_2 type was added in 0e60d850 but the code to
return values of that type wasn't completed.
Fixes conform's default state test. glGetFloatv(GL_DEPTH_RANGE)
wasn't returning anything.
v2: remove stray 'break' statements.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These messages are in code that is shared between the VS and GS
back-ends, so use the terminology "vec4" to avoid confusion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even with depth clipping disabled, vertices which have negative w coords
must be discarded. And since we don't have a proper guardband implementation
yet (relying on driver to handle all values except infs/nans in rasterization
for such points) we need to kill them off manually (as they can end up with
coordinates inside viewport otherwise).
v2: use 0.0f instead of 0 (spotted by Brian).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Also increment ir->offset in the GS visitor, rather than at the
final assembly generation stage (requested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the
48-bit addressing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have a helper function that handles the PIPE_CONTROL
variations between the various platforms, these are basically the same.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I believe that PIPE_CONTROL uses the length field to decide whether to
do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write,
while a length of 5 would do a 64-bit write. (I haven't verified this,
though.)
For workaround writes, we don't care what value gets written, or how
much data. We're only writing something because hardware bugs mandate
that do so. So using a 64-bit write should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The PIPE_CONTROL packet actually has 5 DWords on Gen6+:
1. Header
2. Flags
3. Address
4. Immediate Data: Lower DWord
5. Immediate Data: Upper DWord
We just never emitted the last one. While it appears to work, it's
probably safer to emit the entire thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell uses 48-bit addresses. The first DWord is the low 32 bits,
and the second DWord is the high 16 bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intel_buffer_objects.c: In function 'old_intel_bufferobj_buffer':
intel_buffer_objects.c:471:17: warning: unused parameter 'flag' [-Wunused-parameter]
The parameter hasn't been used since the i915 and i965 drivers had their
breakup. i965 got the flags, and i915 got to cry itself to sleep.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not actually tested, but the changes are identical to the i965 changes
that are tested.
v2: Remove MAX2(64, ...). Suggested by Ken (in the i965 version of this
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
No piglit regressions on IVB.
With minor tweaks to the arb_map_buffer_alignment-map-invalidate-range
test (disable the extension check, set alignment to 64 instead of
querying), the i965 driver would fail the test without this patch (as
predicted by Eric). With this patch, it passes.
v2: Remove MAX2(64, ...). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
v2 (idr): Only enable the extension on GEN7+ w/core profile because it
requires geometry shaders.
v3 (idr): Add some casting to fix setting of ViewportBounds.Min.
Negating an unsigned value, then casting to float doesn't do what you
might think it does.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
noop_scissor (correctly) only examines the scissor rectangle for
viewport 0. Therefore, it should only be called when that scissor
rectangle is enabled.
v2: Remove spurious change to radeon code. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently MaxViewports is still 1, so this won't affect any change.
v2: Minor code reformatting suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers that currently use _Xmin and friends to set their scissor
rectangle will need to use this code directly once they are updated for
GL_ARB_viewport_array.
v2: Use different bit-test idiom and fix mixed tabs and spaces. Both
were suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At various stages the hardware clamps the gl_ViewportIndex to these
values. Setting them to zero effectively makes gl_ViewportIndex be
ignored. This is acutally useful in blorp (so that we don't have to
modify all of the viewport / scissor state).
v2: Use INTEL_MASK to create GEN6_CLIP_MAX_VP_INDEX_MASK. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Define API connections to extension entry points added in previous
commits. Update entry points to use floating point arguments as
required by the extension.
Add get tokens for ARB_viewport_array state.
v2: Include review feedback.
v3 (idr): Fix 'make check'. Add missing Get infrastructure (some was
culled from other pathces).
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_viewport_no_notify / set_depth_range_no_notify (and
manually notify the driver) instead of calling _mesa_set_viewporti /
_mesa_set_depthrangei. Refactor bodies of _mesa_ViewportIndexed and
_mesa_ViewportIndexedv into a shared function. Remove spurious CLAMP
calls in _mesa_DepthRangeArrayv and _mesa_DepthRangeIndexed.
v3 (idr): Add some missing return-statements after calls to _mesa_error.
v4 (idr): Only perform the ViewportBounds.Min / ViewportBounds.Max
clamping in set_viewport_no_notify if GL_ARB_viewport_array is enabled.
Otherwise the driver may not have set ViewportBounds, and the clamping
will do bad things.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_scissor_no_notify (and manually notify the driver)
instead of calling _mesa_set_scissori. Refactory bodies of
_mesa_ScissorIndexed and _mesa_ScissorIndexedv into a shared function.
Perform parameter validation in the same order in all three functions.
Pull MaxViewports comparison fix (in _mesa_ScissorArrayv) from the next
patch to this patch.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the scissor enable state is a bitfield need a custom function
to extract the correct value from gl_context. Modeled
Scissor.EnableFlags after Color.BlendEnabled.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Fix several "comparison between signed and unsigned integer
expressions" warnings.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This matches the expectations of GL_ARB_viewport_array and the storage
type where the values will land.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the restore code would enable all scissor rectangles if any
scissor rectangles were enabled on entry to meta. When there is only
one scissor rectangle, this is fine. As soon as a driver supports
multiple viewports, this will be a problem.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Scissor, make sure that ctx->Driver.Scissor is only called once
instead of once per scissor rectangle.
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Viewport and _mesa_DepthRange, make sure that
ctx->Driver.Viewport is only called once instead of once per viewport or
depth range.
v2: Make _mesa_DepthRange actually set all of the depth ranges (instead
of just index 0). Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
v3: Handle all viewport entries in update_viewport_matrix and
_mesa_copy_context too. This was previously in an earlier patch.
Having the code in the earlier patch could cause _mesa_copy_context to
access a matrix that hadn't been constructed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Create an internal function that just writes data into the scissor
rectangle. In future patches this will see more use because we only
want to call dd_function_table::Scissor once after setting all of the
scissor rectangles instead of once per scissor rectangle.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the viewport. In
future patches this will see more use because we only want to call
dd_function_table::Viewport once after setting all of the viewport
instead of once per viewport.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the depth range.
In future patches this will see more use because we only want to call
dd_function_table::DepthRange once after setting all of the depth ranges
instead of once per depth range.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only element 0 of the array is used anywhere at this time, so there
should be no changes.
v4: Split out from a single megapatch. Suggested by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v4: Split out from a single megapatch. Suggested by Ken. Also make
meta's save_state::ViewportX, ::ViewportY, ::ViewportW, and ::ViewportH
to match gl_viewport_attrib.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used when the viewport near and far plane are stored as
doubles instead of as floats.
v4 (idr): Split out from a single megapatch. Suggested by Ken. Also
drop value_double_4. It's never used anywhere in the patch series.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Update Mesa and drivers to access updated gl_scissor_attrib.
Now have an enable bitfield and array of gl_scissor_rects.
Drivers have been updated to the new scissor enable state
attribute (gl_context.scissor.EnableFlags) but still treat it
as a single boolean which is okay as mesa will only use
bit 0 when communicating with a driver that does not support
ARB_viewport_array.
v2 (idr): Rebase fixes.
v3 (idr): Small code formatting fix suggsted by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These limits will be queryable by GL_MAX_VIEWPORTS,
GL_VIEWPORT_SUBPIXEL_BITS, and GL_VIEWPORT_BOUNDS_RANGE. Drivers that
actually implement the extension must set values for these constants
that comply with the minimum-maximums from the spec.
Most of these changes were part of other patches. They were separated out
because it make reordering of later patches easier. Also, MaxViewports wasn't
set by that patch, and I completely overlooked it in review. It's now obvious
that it's set. :)
v2 (idr): Split these changes out from the original patches. Keep
MaxViewportWidth and MaxViewportHeight as GLuint.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Split these changes out from the original patch. Only
advertise GL_ARB_viewport_array in a core profile because it requires
geometry shaders.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were calling draw_total_vs_outputs() too early. The call to
draw_pt_emit_prepare() could result in the vertex size changing.
So call draw_total_vs_outputs() after draw_pt_emit_prepare().
This fix would seem to be needed for the non-LLVM code as well,
but it's not obvious. Instead, I added an assertion there to
try to catch this problem if it were to occur there.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72926
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of skipping x/y clipping completely if there's point_tri_clip points
use guard band clipping. This should be easier (previously we could not disable
generating the x/y bits in the clip mask for llvm path, hence requiring custom
clip path), and it also allows us to enable this for tris-as-points more easily
too (this would require custom tri clip filtering too otherwise). Moreover,
some unexpected things could have happen if there's a NaN or just a huge number
in some tri-turned-point, as the driver's rasterizer would need to deal with it
and that might well lead to undefined behavior in typical rasterizers (which
need to convert these numbers to fixed point). Using a guardband should hence
be more robust, while "usually" guaranteeing the same results. (Only "usually"
because unlike hw guardbands draw guardband is always just twice the vp size,
hence small vp but large points could still lead to different results.)
Unfortunately because the clipmask generated is completely unaffected by guard
band clipping, we still need a custom clip stage for points (but not for tris,
as the actual clipping there takes guard band into account).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These are components which were originally developed by Tungsten Graphics,
which was in turn acquired by VMware, but are de facto now being maintained
by third-party contributors of the Mesa open-source community.
This matches what's reported by swrast driver and a few other components.
Suggested by Ian Romanick.
Currently the MSVC build is broken because of conflicting definitions of
'log' function. I didn't investigate thoroughly, but I suspect the
it is conflicting standard math.h's log.
log_ is admittedly not a great name, but it is better than a broken build.
A better one can be used in a follow-on build.
By highlighting these special cases makes it clearer to switch
to the fs-generator as the wider scoped compression control
settings used in the current implementation can be simply
dropped.
No regressions on IVB (piglit quick + unit tests).
v2 (Ian): typo in a comment
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Effectively only the mask control bit gets altered for the single
addition in question and hence there is no real need to use a
fresh state control level for it -- that is more useful when
multiple intructions share the same mask and compression settings.
This is a preparation step for removing the explicit compression
control modifiers in the blit compiler. After this patch there
are no nested state control levels making the constant nature of
the compression settings more apparent.
No regressions on IVB (piglit quick + unit tests).
v2 (Matt, Ian): use temporary variable instead of assigning
directly on the same line with a function call.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We call intel_prepare_render() in intelMakeCurrent() to make sure we have
renderbuffers before calling _mesa_make_current(). The only reason we
do this is so that we can have valid defaults for width and height.
If we already have buffers for the drawable we're making current, we
don't need this step.
In itself, this is a small optimization, but it also avoids a round trip
that could block on the display server in a unexpected place.
https://bugs.freedesktop.org/show_bug.cgi?id=72540https://bugs.freedesktop.org/show_bug.cgi?id=72612
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
NV3x cards don't support NPOT textures. Technically this restriction
could be worked around, but since it also doesn't expose any video
decoding hw, just turn it off entirely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
pipe_loader_drm.c: In function 'pipe_loader_drm_probe_fd':
pipe_loader_drm.c:120:4: error: implicit declaration of function 'loader_get_pci_id_for_fd' [-Werror=implicit-function-declaration]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Mesa provides the flexibility of building without the
need to have libdrm present on the system. The situation
has regressed with the recent commit
commit 8c2e7fd846
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Fri Jan 10 23:36:16 2014 +0000
loader: introduce the loader util lib
By isolating libdrm code by #ifndef __NOT_HAVE_DRM_H we
can have libdrm-less builds on across all build systems.
This patch converts Android's _EGL_NO_DRM to __NOT_HAVE_DRM_H
to provide consistency with the other cases within mesa, allows
compilation of libloader on libdrm-less scons and conditionally
links against libdrm if present under automake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73776
BUgzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73777
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only difference is that STATE_SIP takes a 48-bit address, so we need
to output two zeroes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This replaces the old fs_generator backend.
v2: Port to the C-based representation of assembly instructions.
Fix texturing after the texture-grf merge.
v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET.
v4: Pass brw_context to gen8_instruction functions as required.
v5: Fixes for MRT, as well as zero render targets (alpha test only).
v6: Replace n-wide with SIMDn in comments and messages; port over
Topi's blorp-generator changes; add missing TXF_MCS opcode,
fix missing high quality derivatives for DDX; fix typo (all caught
by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+"
comment (caught by Matt). Emit SIMD16 versions of three source
instructions (caught by both Eric and Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the old vec4_generator backend.
v2: Port to use the C-based instruction representation. Also, remove
Geometry Shader offset hacks - the visitor will handle those instead
of this code.
v3: Texturing fixes (including adding textureGather support).
v4: Pass brw_context to gen8_instruction functions as required.
v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes
(caught by Eric). Simplify ADDC/SUBB handling; add comments to
gen8_set_dp_message calls (suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the brw_eu_emit.c layer for Broadwell. It will be
used by both the vector and scalar shader backends.
v2: Port to use the C-based instruction representation.
v3: Fix destination register type for CMP.
v4: Pass brw to gen8_instruction functions (required by rebase).
v5: Remove bogus assertion on math instructions (caught by Piglit).
v6: Remove more restrictions on math instructions (caught by Eric).
Make ADDC and SUBB helpers set accumulator writes, like MAC and
MACH (caught by Matt).
v7: Don't implicitly force ALU3 operations to SIMD8 (we've been able
to do SIMD16 versions since Haswell, but didn't when I originally
wrote this code).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Heavily based on Keith Packard's existing brw_disasm.c code. I've tried
to go through most of the pieces (like SFIDs) and update the lists to
include features added in recent generations.
v2: Port to use the C-based instruction emitters. This allows us to use
C99 array initializers, which tidies up some of the code.
v3: Improve decoding of render target write messages.
v4: Update for BRW_REGISTER_TYPE becoming an abstraction.
v5: Rebase on Chris Forbes' SFID message defines.
v6: Fix disassembly of UV immediates; remove silly casts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell significantly changes the EU instruction encoding. Many of
the fields got moved to different bit positions; some even got split
in two.
With so many changes, it was infeasible to continue using struct
brw_instruction. We needed a new representation.
This new approach is a bit different: rather than a struct, I created a
class that has four DWords, and helper functions that read/write various
bits. This has several advantages:
1. We can create several different names for the same bits. For
example, conditional modifiers, SFID for SEND instructions, and the
MATH instruction's function opcode are all stored in bits 27:24.
In each situation, we can use the appropriate setter function:
set_sfid(), set_math_function(), or set_cond_modifier(). This
is much easier to follow.
2. Since the fields are expressed using the original 128-bit numbers,
the code to create the getter/setter functions follows the table in
the documentation very closely.
To aid in debugging, I've enabled -fkeep-inline-functions when building
gen8_instruction.c. Otherwise, these functions cannot be called by
gdb, making it insanely difficult to print out anything.
Kenneth Graunke wrote most of this code. Damien Lespiau ported it to
C99. Xiang Haihao added media fields. Zhao Yakui added indirect
addressing support. Eric Anholt added an assertion to make sure that
values fit in the alloted number of bits.
v2: Update for brw_reg_type_to_hw_type(), which necessitates passing
brw_context pointers around everywhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
While we probably won't ever use these, having them makes it easy to
share disassembler code between intel-gpu-tools and Mesa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is not observed to actually fix anything, but the PRM says this
field must be zero for other surface types.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously this was missing many interesting fields. Having them decoded
makes debugging views much easier.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The index passed to the function is already unsigned, and internally
we threat it as unsigned.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of NV50_MAX_PIPE_CONSTBUFS
per shader stage. Currently the nv50 driver handles only 3 shader
stages, thus we wreck chaos when accessing array-out-of-bounds.
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of PIPE_MAX_SAMPLERS per shader stage.
Currently nv50 driver handles only 3 shader stages, thus we wreck chaos when
accessing array-out-of-bounds.
Fixes a segfault in piglit/bin/arb_texture_buffer_object-data-sync -fbo -auto
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use the kernel driver name are returned by drmGetVersion() for
non-pci(platform) devices.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2 (Emil): Rebased and weaked commit message.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Culled out of the "loader: refactor duplicated code into loader util lib"
patch by Rob Clark.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
As per original approach by Rob, each user of the loader lib should include
loader.h and the pci_id_driver_map.h header will be used exclusively by the
loader.
Add back the include guard __IS_LOADER and remove no longer needed include
folder in the scons build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Additionally this commit removes the following exported functions
_gbm_udev_device_new_from_fd()
_gbm_fd_get_device_name()
_gbm_log()
All three were erroneously marked as exported since their inception.
Neither of them has ever been a part of the API thus there should be
no users of them.
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All the various window system integration layers duplicate roughly the
same code for figuring out device and driver name, pci-id's, etc. Which
is sad. So extract it out into a loader util lib.
v2 (Emil)
* Separate the introduction of libloader from the code de-duplication.
* Strip out non-pci devices support.
* Add scons + Android build system support.
* Add VISIBILITY_CFLAGS to avoid exporting the loader funcs.
v3 (Emil)
* PIPE_OS_ANDROID is undefined at this scope, use ANDROID
* Make sure we define _EGL_NO_DRM when building only swrast
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Using an unoptimized variant of glamor spending 50% of its CPU time in
brw_draw_prims() (and hitting the cache *very* frequently):
N Min Max Median Avg Stddev
x 200 29200 40500 34900 34750 958.43256
+ 200 31000 40300 34700 34622 916.35941
No difference proven at 95.0% confidence
Similarly, no difference on GLB2.7:
N Min Max Median Avg Stddev
x 63 64.1 71.36 70.69 70.113175 1.6782026
+ 63 63.6 71.18 70.75 70.223651 1.6044186
No difference proven at 95.0% confidence
v2: Rebase on master (by anholt)
v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH
didn't have the asserts about batchbuffer usage that ADVANCE_BATCH
does, so we started assertion failing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Those are the terms used in the docs, and think "n-wide" was something I
just happened to say. Note that shader-db needs updating for the
INTEL_DEBUG=fs parsing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The original intent was that we'd keep a driver-private copy, and there
would be the normal copy for swrast to make use of without the tuning (or
anything more invasive we might do) specific to i965. Only, we don't
generate swrast code any more, because swrast can't render current shaders
anyway. Thus, our private copy is rather a waste, and we can just do our
backend-specific operations on the linked shader.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the
old copyright name is creating unnecessary confusion, hence this change.
This was the sed script I used:
$ cat tg2vmw.sed
# Run as:
#
# git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
#
# Rename copyrights
s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
/Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
s/TUNGSTEN GRAPHICS/VMWARE/g
# Rename emails
s/alanh@tungstengraphics.com/alanh@vmware.com/
s/jens@tungstengraphics.com/jowen@vmware.com/g
s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
s/michel@tungstengraphics.com/daenzer@vmware.com/g
s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
s/zack@tungstengraphics.com/zackr@vmware.com/
# Remove dead links
s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g
# C string src/gallium/state_trackers/vega/api_misc.c
s/"Tungsten Graphics, Inc"/"VMware, Inc"/
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression since 9baa45f78b
but some of the piglit fbo-drawbuffers-none tests still don't
pass.
v2: use the right pointer type for 'h'
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes regression from 9baa45f78b
v2: incorporate a few small changes suggested by Roland.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The whole round-pointsize-to-int stuff must only be done with GL legacy
rules (no point_quad_rasterization) or all the wrong edges are lit up.
This was previously in a private branch (d3d pointsprite test complains
loudly otherwise) and got lost in a merge. However, it should certainly
apply to GL point sprite rasterization as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
OpenGL does whole-point clipping, that is a large point is either fully
clipped or fully unclipped (the latter means it may extend beyond the
viewport as long as the center is inside the viewport). d3d9 (d3d10 has
no large points) however requires points to be clipped after they are
expanded to a rectangle. (Note some IHVs are known to ignore GL rules at
least with some hw/drivers.)
Hence add a rasterizer bit indicating which way points should be clipped
(some drivers probably will always ignore this), and add the draw interaction
this requires. Drivers wanting to support this and using draw must support
large points on their own as draw doesn't implement vp clipping on the
expanded points (it potentially could but the complexity doesn't seem
warranted), and the driver needs to do viewport scissoring on such points.
Conflicts:
src/gallium/drivers/llvmpipe/lp_context.c
src/gallium/drivers/llvmpipe/lp_state_derived.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit c13970808 (mesa: GL_EXT_secondary_color is not optional) changed
CHECK_EXTENSION2(EXT_secondary_color, ARB_vetex_program, cap)
to
CHECK_EXTENSION(ARB_vertex_program, cap)
However CHECK_EXTENSION2 checks that either extension is available, not
both. Remove the extension check entirely since the intent was for it to
always be enabled.
v2: Fix glGet*(GL_COLOR_SUM) too. Suggested by Ian.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 9.2 10.0 <mesa-stable@lists.freedesktop.org>
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 95bf222603 (cso_context: Fix cso_context::sample_mask initial
value.) fixed the cso sample mask to be initialized to ~0. The cso code
is also careful not to needlessly call set_sample_mask, so we ended up
with the ctx->sample_mask never being set. This broke a number of
EXT_framebuffer_multisample piglit tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
ctx->DrawIndirectBuffer wasn't being free'd in _mesa_free_buffer_objects
With this patch, "valgrind --leak-check=full glxgears" on evergreen (CEDAR)
now shows:
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 70,228 bytes in 651 blocks
suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The radeonsi code was not cleaning up either of these items leading to
leaked memory.
v2: Move cleanup to r600_common_context_cleanup instead of duplicating
the logic for SI
CC: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The i830 and i915 drivers used them, but they didn't really need to.
They will just be annoying in future patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future patch will rename some of the fields of gl_viewport_attrib, and
I don't want to update dead code that I can't test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Dave Airlie <airlied@redhat.com>
The ES and desktop GL specs diverge here. Yay!
In desktop OpenGL, the driver can perform online compression of
uncompressed texture data. GL_NUM_COMPRESSED_TEXTURE_FORMATS and
GL_COMPRESSED_TEXTURE_FORMATS give the application a list of formats
that it could ask the driver to compress with some expectation of
quality. The GL_ARB_texture_compression spec calls this "suitable for
general-purpose usage." As noted above, this means
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT is not included in the list.
In OpenGL ES, the driver never performs compression.
GL_NUM_COMPRESSED_TEXTURE_FORMATS and GL_COMPRESSED_TEXTURE_FORMATS give
the application a list of formats that the driver can receive from the
application. It is the *complete* list of formats. The
GL_EXT_texture_compression_s3tc spec says:
"New State for OpenGL ES 2.0.25 and 3.0.2 Specifications
The queries for NUM_COMPRESSED_TEXTURE_FORMATS and
COMPRESSED_TEXTURE_FORMATS include COMPRESSED_RGB_S3TC_DXT1_EXT,
COMPRESSED_RGBA_S3TC_DXT1_EXT, COMPRESSED_RGBA_S3TC_DXT3_EXT,
and COMPRESSED_RGBA_S3TC_DXT5_EXT."
Note that the addition is only to the OpenGL ES specification!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
See-also: http://lists.freedesktop.org/archives/mesa-dev/2013-October/047439.html
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Returning a reference is incorrect if the specified pair was a
temporary -- Instead of that, use decltype() to deduce the correct
return type qualifiers. Fixes a crash in clCreateProgramWithBinary().
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
According to the spec it's allowed to call clBuildProgram() on a
program created from a user-specified binary. We don't need to do
anything to build the program in that case.
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
This avoids the inefficient multiple evaluation of the map result in
the code below. It should cause no functional changes.
Tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
From ARB_shader_image_load_store:
If a texture object bound to one or more image units is deleted by
DeleteTextures, it is detached from each such image unit, as though
BindImageTexture were called with <unit> identifying the image unit
and <texture> set to zero.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And uncomment the relevant lines of the dispatch sanity test.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Name image format classes consistently, fix array and 3D teximage
selection with layered = GL_FALSE, make sure that the
user-specified layer is less than the number of texture layers,
add some asserts.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. ARB_shader_image_load_store
requires support for the GL_RG8_SNORM and GL_RG16_SNORM formats, which
map to MESA_FORMAT_SIGNED_GR88 and MESA_FORMAT_SIGNED_GR1616 on
little-endian hosts, and MESA_FORMAT_SIGNED_RG88 and
MESA_FORMAT_SIGNED_RG1616 respectively on big-endian hosts -- only the
former were already present, add support for the latter.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. This texture format is a
requirement for ARB_shader_image_load_store.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Increase MAX_IMAGE_UNITS to what i965 wants and add a separate
MAX_IMAGE_UNIFORMS define, clarify a couple of comments.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And to check if it can have layers at all. This will be used by the
implementation of ARB_shader_image_load_store.
v2: Fix constness of texobj argument, use assert and return reasonable
default rather than calling unreachable() in default switch case.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The temporary variable used to store _ColorDrawBufferIndexes must be
signed (GLint), otherwise the following conditional will be incorrectly
evaluated. Leading to crashes in the driver/mesa or accessing/writing
to arbitrary memory location. The bug dates back to 2009.
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
_ColorDrawBufferIndexes is defined as GLint* and using a GLuint*
will result in the first part of the conditional to be evaluated to
true always.
Unintentionally introduced by the following commit, this will result
in a driver segfault if one is using an old version of the piglit test
bin/clearbuffer-mixed-format -auto -fbo
commit 03d848ea10
Author: Marek Olšák <marek.olsak@amd.com>
Date: Wed Dec 4 00:27:20 2013 +0100
mesa: fix interpretation of glClearBuffer(drawbuffer)
This corresponding piglit tests supported this incorrect behavior instead of
pointing at it.
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If it wasn't necessary for Haswell, it's likely not to be necessary for
Broadwell either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to disable HiZ for non-8x4 aligned levels, except for level 0, layer
0. For the very first layer we can adjust Width and Height fields of
3DSTATE_DEPTH_BUFFER to make it aligned.
Specifically, add ILO_TEXTURE_HIZ and set the flag only for properly aligned
levels. ilo_texture_can_enable_hiz() is updated to check for the flag.
In tex_layout_validate(), align the depth bo to 8x4 so that we can adjust
Width/Height of 3DSTATE_DEPTH_BUFFER without introducing out-of-bound access.
Finally in rectlist blitter, add the ability to adjust 3DSTATE_DEPTH_BUFFER.
Add tex_layout_init_hiz() before tex_layout_init_format() to decide whether
HiZ should be enabled.
On GEN6, because of layer offsetting, HiZ is enabled only when the texture is
non-mipmapped and non-array. PIPE_USAGE_STAGING is also taken as a hint to
disable HiZ.
These can't use foreach_list since they want to skip over the first few
list elements. Just doing the ad-hoc list walking isn't too bad.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When handling function calls, we often want to walk through the list of
formal parameters and list of actual parameters at the same time.
(Both are guaranteed to be the same length.)
Previously, we used a pattern of:
exec_list_iterator 1st_iter = <1st list>.iterator();
foreach_iter(exec_list_iterator, 2nd_iter, <2nd list>) {
...
1st_iter.next();
}
This was awkward, since you had to manually iterate through one of
the two lists.
This patch introduces a foreach_two_lists macro which safely walks
through two lists at the same time, so you can simply do:
foreach_two_lists(1st_node, <1st list>, 2nd_node, <2nd list>) {
...
}
v2: Rename macro from foreach_list2 to foreach_two_lists, as suggested
by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formal function parameters are always ir_variable objects, not an
arbitrary ir_instruction. So there's no need to dynamically cast here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A function call's parameters are always rvalues. ir_rvalue may not
always be a subclass of ir_instruction in the future, so we should use
the right one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
foreach_iter and exec_list_iterators have been deprecated for some time now;
we just hadn't ever bothered to convert code to the newer foreach_list
and foreach_list_safe macros.
In these cases, we aren't editing the list, so we can use foreach_list
rather than foreach_list_safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to this patch, if we ran out of aperture space during
brw_try_draw_prims(), we would rewind the batch buffer pointer
(potentially throwing some state that may have been emitted by
brw_upload_state()), flush the batch, and then try again. However, we
wouldn't reset the dirty bits to the state they had before the call to
brw_upload_state(). As a result, when we tried again, there was a
danger that we wouldn't re-emit all the necessary state. (Note: prior
to the introduction of hardware contexts, this wasn't a problem
because flushing the batch forced all state to be re-emitted).
This patch fixes the problem by leaving the dirty bits set at the end
of brw_upload_state(); we only clear them after we have determined
that we don't need to rewind the batch buffer.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
An example why it is required:
Let's say there's a fragment shader writing to gl_FragData[0..1].
The user calls: glDrawBuffers(2, {GL_NONE, GL_COLOR_ATTACHMENT0});
That means gl_FragData[0] is unused and gl_FragData[1] is written
to GL_COLOR_ATTACHMENT0.
st/mesa was skipping the GL_NONE draw buffer, therefore gl_FragData[0]
was written to GL_COLOR_ATTACHMENT0, which was wrong.
This commit fixes it, but drivers must also be fixed not to crash when
binding NULL colorbuffers. There is also a new set of piglit tests for this.
The MSAA state also had to be fixed not to crash when reading fb->cbufs[0].
Reviewed-by: Brian Paul <brianp@vmware.com>
yInverted is used by EGL_NOK_texture_from_pixmap to indicate that
window system rendering is y-inverted compared to OpenGL texture
representation. This extension is only known to be used with X11
window system where sane default is GL_TRUE.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73371
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri2_add_config makes decisions based on NOK_texture_from_pixmap so
it needs to be enabled before calling dri2_add_configs_for_visuals.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Piglit was recently changed to expect the correct error code (piglit
commit 271b998), so it started failing on Mesa. This corrects that
failing and adds some spec quotations to justify the errrors set.
The code was rearranged a little bit to match the order listed in the
spec.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_queryobj.c needs a version of write_timestamp that works on all
generations for the QueryCounter() driver hook. So there's no point in
duplicating it in gen6_queryobj.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, Mesa enforced the following rule (from
ARB_geometry_shader4's list of criteria for framebuffer completeness):
* If any framebuffer attachment is layered, all attachments must have
the same layer count. For three-dimensional textures, the layer count
is the depth of the attached volume. For cube map textures, the layer
count is always six. For one- and two-dimensional array textures, the
layer count is simply the number of layers in the array texture.
{ FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB }
However, when ARB_geometry_shader4 was adopted into GL 3.2, this rule
was dropped; GL 3.2 permits different attachments to have different
layer counts. This patch brings Mesa in line with GL 3.2.
In order to ensure that layered clears properly clear all layers, we
now have to keep track of the maximum number of layers in a layered
framebuffer.
Fixes the following piglit tests in spec/!OpenGL 3.2/layered-rendering:
- clear-color-all-types 1d_array mipmapped
- clear-color-all-types 1d_array single_level
- clear-color-mismatched-layer-count
- framebuffer-layer-count-mismatch
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
From section 4.4.4 (Framebuffer Completeness) of the GL 3.2 spec:
If any framebuffer attachment is layered, all populated
attachments must be layered. Additionally, all populated color
attachments must be from textures of the same target.
We weren't checking that the attachments were from textures of the
same target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 1a92881 added extra flushes to fix a HiZ hang in
WebGL Google Maps. With the extra flushes emitted by the previous two
patches, the flushes added by 1a92881 are redundant.
Tested with the same criteria as in 1a92881: by zooming in and out
continuously for 2 hours on Sandybridge Chrome OS (codename
Stumpy) without a hang.
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unconditionally set brw->need_workaround_flush at the top of gen6 blorp
state emission.
The art of emitting workaround flushes on Sandybridge is mysterious and
not fully understood. Ken and I believe that
intel_emit_post_sync_nonzero_flush() may be required when switching from
regular drawing to blorp. This is an extra safety measure to prevent
undiscovered difficult-to-diagnose gpu hangs.
I verified that on ChromeOS, pre-patch, need_workaround_flush was not
set at the top of blorp, as Paul expected. To verify, I inserted the
following debug code at the top of gen6_blorp_exec(), restarted the ui,
and inspected the logs in /var/log/ui. The abort gets triggered so early
that the browser never appears on the display.
static void
gen6_blorp_exec(...)
{
if (!brw->need_workaround_flush) {
fprintf(stderr, "chadv: %s:%d\n", __FILE__, __LINE__);
abort();
}
...
}
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch makes the workaround code in gen6 blorp follow the pattern
established in the regular draw path. It shouldn't result in any
behavioral change.
On gen6, there are two places where we emit 3D_CMD_PRIM: brw_emit_prim()
and gen6_blorp_emit_primitive(). brw_emit_prim() sets
need_workaround_flush immediately after emitting the primitive, but
blorp does not. Blorp sets need_workaround_flush at the bottom of
brw_blorp_exec().
This patch moves the need_workaround_flush from brw_blorp_exec() to
gen6_blorp_emit_primitive(). There is no need to set
need_workaround_flush in gen7_blorp_emit_primitive() because the
workaround applies only to gen6.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
We weren't handling the LUMINANCE_SNORM, LUMINANCE_ALPHA_SNORM and
INTENSITY_SNORM cases. Note that adding these cases here does not
require a driver to support rendering to these surface types. If
the driver can't do it we'll report an incomplete framebuffer.
NVIDIA doesn't support GL_EXT_texture_snorm but their driver
accepts these formats in glRenderBufferStorage().
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This allows the caller to execute it in a loop rather than
hand-rolling a separate call for each stage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are replaced with
ctx->Const.Program[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In
patches to follow, this will allow us to replace a lot of ad-hoc logic
with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' -o -iname '*.py' \
-o -iname '*.y' ')' -print0 | xargs -0 sed -i \
-e 's/Const\.VertexProgram/Const.Program[MESA_SHADER_VERTEX]/g' \
-e 's/Const\.GeometryProgram/Const.Program[MESA_SHADER_GEOMETRY]/g' \
-e 's/Const\.FragmentProgram/Const.Program[MESA_SHADER_FRAGMENT]/g'
Suggested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit eda21d2a30 fixed the rasterization
of points for Direct3D but ended up breaking the rasterization of OpenGL
non-sprite points, in particular conform's pntrast.c test.
The only way to get both working is to properly honour
pipe_rasterizer::point_quad_rasterization, and follow the weird OpenGL
rule when it is false.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We definitely want to fall through to the unsynchronized map case, instead
of wasting bandwidth on a copy. Prevents a -43.2407% +/- 1.06113% (n=49)
performance regression on aa10perf when teaching glamor to provide the
GL_INVALIDATE_RANGE_BIT information.
This is a performance fix, which I usually wouldn't cherry-pick to stable.
But this was really was just a bug in the code, its presence would
discourage developers from giving us the best information they can, and I
think we've got fairly high confidence in the unsynchronized map path
already.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While incorrect, it probably wouldn't affect anyone ever: You'd have to do
an appropriately-formatted readpixels into a PBO, then overwrite the tail
end of the updated area of the PBO with glBufferSubData(), and you
wouldn't get appropriate synchronization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The earlier assert made sure that our math didn't exceed our bounds, but
this makes sure that we don't overflow from the high bits X into the low
bits of Y. We've already put checks in intel_miptree_blit(), but I've
wanted to expand the type in our protoype from short to uint32_t, and we
could get in trouble with intel_emit_linear_blit() if we did.
v2: Add Ken's comment about the funny language extension used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
This was one of the things we always wanted to do to this, to make it more
useful than just (value << FIELD_MASK).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
With all of the flipping and pitch twiddling and miptree layout involved
in our blits, there are lots of ways for us to scribble outside of a
buffer. Put in a check that we're not about to do so.
This catches a bug that glamor was running into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes the following compile error:
src\glsl\ir_constant_expression.cpp(1405) : error C2666: 'copysign' : 3
overloads have similar conversions
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we now have the cmdstream patch mechanism needed for hw binning,
might as well also use it for RB_RENDER_CONTROL updates. This avoids
the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The hardware is broken with nonzero texel offsets and unnormalized
coordinates; instead of doing correct offsetting, we get garbage.
This just extends the existing workaround for ir_txf and
ir_tg4+gsampler2DRect to also consider ir_tex+gsampler2DRect.
Fixes broken rendering in 'tesseract' when 'mesa_texrectoffset_bug' is
not enabled; also fixes the new piglit test
'tests/spec/glsl-1.30/execution/fs-textureOffset-Rect'.
Has been broken ~forever; suggesting including this in only 10.0 because
the lowering pass doesn't exist in 9.2 or earlier so would require quite
a different patch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lee Salzman <lsalzman@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename "shaderType" param of is_varying_var() to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces confusion since gl_shader::Type is sometimes
GL_SHADER_PROGRAM_MESA but is more frequently
GL_SHADER_{VERTEX,GEOMETRY,FRAGMENT}. It also has the advantage that
when switching on gl_shader::Stage, the compiler will alert if one of
the possible enum types is unhandled. Finally, many functions in
src/glsl (especially those dealing with linking) already use
gl_shader_stage to represent pipeline stages; using gl_shader::Stage
in those functions avoids the need for a conversion.
Note: in the process I changed _mesa_write_shader_to_file() so that if
it encounters an unexpected shader stage, it will use a file suffix of
"????" rather than "geom".
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also move the related #define MESA_SHADER_STAGES. This will allow
gl_shader_stage to be used in struct gl_shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we had an enum called gl_shader_type which represented
pipeline stages in the order they occur in the pipeline
(i.e. MESA_SHADER_VERTEX=0, MESA_SHADER_GEOMETRY=1, etc), and several
inconsistently named functions for converting between it and other
representations:
- _mesa_shader_type_to_string: gl_shader_type -> string
- _mesa_shader_type_to_index: GLenum (GL_*_SHADER) -> gl_shader_type
- _mesa_program_target_to_index: GLenum (GL_*_PROGRAM) -> gl_shader_type
- _mesa_shader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
This patch tries to clean things up so that we use more consistent
terminology: the enum is now called gl_shader_stage (to emphasize that
it is in the order of pipeline stages), and the conversion functions are:
- _mesa_shader_stage_to_string: gl_shader_stage -> string
- _mesa_shader_enum_to_shader_stage: GLenum (GL_*_SHADER) -> gl_shader_stage
- _mesa_program_enum_to_shader_stage: GLenum (GL_*_PROGRAM) -> gl_shader_stage
- _mesa_progshader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
In addition, MESA_SHADER_TYPES has been renamed to MESA_SHADER_STAGES,
for consistency with the new name for the enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename the "target" field of _mesa_glsl_parse_state and the
"target" parameter of _mesa_shader_stage_to_string to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
The adjustment needs to be applied to the y coordinates and not the x
coordinates, just like the equivalent code for lines and triangles in
lp_setup_line.c and lp_setup_tri.c.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This was inadvertently forgotten when replacing gl_rasterization_rules
with lower_left_origin and half_pixel_center (commit
2737abb44e).
This makes a difference when lower_left_origin != half_pixel_center, e.g,
D3D10.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
When the depth buffer is to be read, perform a Depth Buffer Resolve if it has
been rendered. When the depth buffer is to be rendered, perform a HiZ Buffer
Resolve when the depth buffer is modified externally.
Add blitter functions to perform Depth Buffer Clear, Depth Buffer Resolve, and
Hierarchical Depth Buffer Resolve. Those functions set ilo_blitter up and
pass it to the pipelines to emit the commands.
Allow HiZ op to be specified in 3DSTATE_WM. Pass depth format directly in
gen7_emit_3DSTATE_SF. Use tex->hiz.bo to determine if HiZ exists. Fix
3DSTATE_SF for the case when there is no ilo_rasterizer_state. Fix
3DSTATE_PS for the case when there is no ilo_shader_state.
GEN6 has several requirements regarding the LOD/Depth/Width/Height of the
render targets and the depth buffer. We used to offset to the layers in
question unconditionally to meet the requirements. With this commit,
offseting is done only when the requirements are not met.
On Haswell, POW takes 24 cycles, while EXP2 only takes 14. Plus, using
POW requires putting 2.0 in a register, while EXP2 doesn't.
I believe that EXP2 will be faster than POW on basically all GPUs, so
it makes sense to optimize it.
Looking at the savage2 subset of shader-db:
total instructions in shared programs: 113225 -> 113179 (-0.04%)
instructions in affected programs: 2139 -> 2093 (-2.15%)
instances of 'math pow': 795 -> 749 (-6.14%)
instances of 'math exp': 389 -> 435 (11.8%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch creates a new generic is_value() method, which checks if an
ir_constant has a particular value. (For vectors, it must have the
single value repeated across all components.)
It then rewrites the is_zero/is_one/is_negative_one methods to use this
generic helper. All three were basically identical except for the value
they checked for. The other difference is that is_negative_one rejects
boolean types. The new is_value function maintains this behavior, only
allowing boolean types when checking for 0 or 1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using the get_local_param_pointer helper ensures that the LocalParams
arrays have actually been allocated before attempting to use them.
glProgramLocalParameters4fvEXT needs to do a bit of extra checking,
but it can be simplified since the helper has already validated the
target.
Fixes crashes in programs that use Cg (for example, Awesomenauts,
Rocketbirds: Hardboiled Chicken, and Tiny and Big: Grandpa's Leftovers)
since commit e5885c119d
(mesa: Dynamically allocate the storage for program local parameters.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73136
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
We don't support MSAA (ie, number of samples is always one) therefore
sample_mask boils down to a synonym of the rasterizer_discard flag.
Also, this change makes setup actually use the value received in
lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe
upper layers to re-fetch it.
Based on Si Chen's draft.
With this patch `wgf11multisample Coverage passes 100%` on the UMD
D3D10 state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
The initial value of cso_context::sample_mask_saved is irrelevant as it
will be overwritten with cso_context::sample_mask in
cso_save_sample_mask. Therefore it is cso_context::sample_mask that
needs to be properly initialized.
This fixes regressions in blits and mipmap generation after adding
support for sample_mask to llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Implement Alpha to Coverage by discarding a fragment alpha component is
less than 0.5. This is a joint work of Jose and Si.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 9119269ca1 moved the texel
buffer allocation to _swrast_texture_span(), however, when compiled
with OpenMP support this code already runs multi-threaded so a
critical section is required to prevent multiple allocations and
rendering errors.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Evidently, there's some other definition of "min" and "max" that
causes MSVC to choke on these function names. Renaming to min2()
and max2() fixes things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both brw_defines.h and intel_reg.h defined PIPE_CONTROL fields, which
had similar names, but couldn't be used in the same way. (One had
built-in shifts, and the other didn't...)
Delete the unused set to preserve sanity.
(Eric wrote an almost identical patch back in August, so I believe he
approves.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this build error with gcc <= 4.5 and clang <= 3.1.
CC clientattrib.lo
In file included from ../../include/GL/glx.h:333:0,
from glxclient.h:45,
from clientattrib.c:32:
../../include/GL/glxext.h:275:13: error: redefinition of typedef 'GLXContextID'
../../include/GL/glx.h:171:13: note: previous declaration of 'GLXContextID' was here
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70591
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* The Haiku renderers need to link to libGL to function properly
in all usage contexts. As mesa drivers build before gallium
targets, we couldn't properly link the mesa swrast driver to
the gallium libGL target for Haiku.
* This is likely better as it mimics how glx is laid out ensuring
the Haiku libGL is better understood.
* All renderers properly link in libGL now.
Acked-by: Brian Paul <brianp@vmware.com>
These are just software flag values (not hardware specific values), and
aren't used anywhere. Delete them to avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add extra null check in auto generated indirect_init.c via
src/mapi/glapi/gen/glX_proto_send.py
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we left the size of this vgrf as 1, which caused register
allocation to be subtly broken. If we were lucky we would explode in
the post-alloc instruction scheduler; if we were unlucky we'd just stomp
on someone else and get broken rendering.
Fixes crash when running `tesseract` with the following settings:
msaa 4
glineardepth 0
Also fixes the piglit test:
arb_sample_shading-builtin-gl-sample-id
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72859
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The preprocessor currently accepts multiple else/elif-groups
per if-section. The GLSL-preprocessor is defined by the C++
specification, which defines the following parse-rule:
if-section:
if-group elif-groups(opt) else-group(opt) endif-line
This clearly only allows a single else-group, that has to come
after any elif-groups.
So let's modify the code to follow the specification. Add test
to prevent regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).
The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:
#define FOO a/*
*/b
FOO
Prior to commit bd55ba568b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.
In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).
Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).
It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.
Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
Finally, this commit also changes the expected output of a few, existing glcpp
tests. The change here is that the space character resulting from the
multi-line comment is now emitted before the newlines corresponding to that
comment. (Previously, the newlines were emitted first, and the space character
afterward.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Two things make this code confusing:
1. The uncharacteristic manipulation of lexer start state outside of
flex rules.
2. The confusing semantics of the skip_stack (including the
"lexing_if" override and the SKIP_NO_SKIP state).
This new comment is intended to bring a bit more clarity for any readers.
There is no intended beahvioral change to the code here. The actual code
changes include better indentation to avoid an excessively-long line, and
using the more descriptive INITIAL rather than 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Support all levels of a supported texture format.
Using 1024x1024, RGBA 8888 source, mipmap
internal-format Before (MB/sec) mipmap (MB/sec)
GL_RGBA 627.15 615.90
GL_RGB 456.35 611.53
512x512
GL_RGBA 597.00 619.95
GL_RGB 440.62 611.28
256x256
GL_RGBA 487.80 587.42
GL_RGB 376.63 585.00
Benchmark has been sent to mesa-dev list: teximage_enh
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
MESA_FORMAT_XRGB8888 is equivalent to MESA_FORMAT_ARGB8888 in terms
of storage on the device, so okay to use this optimized copy routine.
This series builds on work from Frank Henigman to optimize the
process of uploading a texture to the GPU. This series adds support for
MESA_XRGB_8888 and full miptrees where were found to be common activities
in the Smokin' Guns game. The issue was found while profiling the app
but that part is not benchmarked. Smokin-Guns uses mipmap textures with
an internal format of GL_RGB (MESA_XRGB_8888 in the driver).
These changes need a performance tool to run against to show how they
improve execution performance for specific texture formats. Using this
benchmark I've measured the following improvement on my Ivybridge
Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz.
1024x1024 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 628.15 627.15
GL_RGB 265.95 456.35
512x512 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 600.23 597.00
GL_RGB 255.50 440.62
256x256 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 489.08 487.80
GL_RGB 229.03 376.63
Benchmark has been sent to mesa-dev list: teximage
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Only a Mesa bug could cause this function to be called with an
out-of-range index, so raise an assertion if that ever happens.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces the following pattern:
foo bar[MESA_SHADER_TYPES] = {
...
};
With:
foo bar[] = {
...
};
STATIC_ASSERT(Elements(bar) == MESA_SHADER_TYPES);
This way, when a new shader type is added in a future version of Mesa,
we will get a compile error to remind us that the array needs to be
updated.
Reviewed-by: Brian Paul <brianp@vmware.com>
This argument was carrying the name of the shader target (as a
string). We can get this just as easily by calling
_mesa_shader_enum_to_string().
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously, _mesa_glsl_shader_target_name() had an overload for GLenum
and an overload for the gl_shader_type enum, each of which behaved
differently. However, since GLenum is a synonym for unsigned int, and
unsigned ints are often used in place of gl_shader_type (e.g. in loop
indices), there was a big risk of calling the wrong overload by
mistake. This patch gives the two overloads different names so that
it's always clear which one we mean to call.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 136a12ac98.
According to belak51 on IRC, this commit broke Allegro, which would no
longer compile. Applications apparently expect the GLXContextID typedef
to exist in glx.h; removing it breaks them. A bit of searching around
the internet revealed other complaints since upgrading to Mesa 10.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
According to git blame, this hasn't been used in over two years:
commit d2235b0f46
Author: Eric Anholt <eric@anholt.net>
Date: Thu Nov 17 17:01:58 2011 -0800
i965: Always handle GL_DEPTH_TEXTURE_MODE through the shader.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
instead of ignoring the argument and always dumping to
standard output.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Previously we were creating a new LLVMContext every time that we called
radeon_llvm_parse_bitcode, which caused us to leak the context every time
that we compiled a CL program.
Sadly, we can't dispose of the LLVMContext at the point that it was being
created because evergreen_launch_grid (and possibly the SI equivalent) was
assuming that the context used to compile the kernels was still available.
Now, we'll create a new LLVMContext when creating EG/SI compute state, store
it there, and pass it to all of the places that need it.
The LLVM Context gets destroyed when we delete the EG/SI compute state.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash where old_view->context was already freed in the
pipe_sampler_view_reference function contained in
src/gallium/auxiliary/utils/u_inlines.h. As a result, the
sampler_view_destroy function pointer contained 0xfeeefeee indicating
freed heap memory.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Useful in its own right, but also needed for adaptive vsync.
No regressions in the piglit glx-oml-sync-control-getmscrate test.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
It is the maximum number of back buffers, but the name is confusing and is
easily read as the maximum back buffer index. Chage to DRI3_NUM_BACK to make
the intended usage a bit clearer.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just copying code from the dri2 path to set up the fast color clear state.
This also removes a couple of bogus intel_region_reference calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The buffer-object is the persistent thing passed through the loader, so when
updating an image buffer, check to see if it is already bound to the provided
bo. The region, on the other hand, is allocated separately for the miptree,
and so will never be the same as that passed back from the loader.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move the depth field up with width and height.
Remove unused previous_time and frames fields.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libxshmfence v1.0 foolishly used 'int32_t *' for the fence type, which
works when the fence is a linux futex. However, version 1.1
changes the exported datatype to 'struct xshmfence *'
Require libxshmfence version 1.1 and switch the API around.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While looking through the documentation, I found this in the Sandybridge
PRM (Volume 4, Part 1, Page 140):
"Use of sample_c with SURFTYPE_CUBE surfaces is undefined with the
following surface formats: I24X8_UNORM, L24X8_UNORM, A24X8_UNORM,
I32_FLOAT, L32_FLOAT, A32_FLOAT."
I haven't observed this to be true, but it suggests that we may want to
use other formats.
We already perform DEPTH_TEXTURE_MODE swizzling in the shaders, and
don't rely on the surface format to splat things appropriately. So
using RED should work just as well as INTENSITY.
A few notes about the formats:
- R24_UNORM_X8_TYPELESS has the exact same properties as I24X8_UNORM.
- R16_UNORM and R32_FLOAT are additionally supported as a render target,
while the old I16_UNORM/I32_FLOAT formats are not.
- R32_FLOAT_X8X24_TYPELESS is not supported as a render target, while
the old format (R32G32_FLOAT) was. However, it shares the same
properties as the formats we use for Z24, so it should suffice.
This makes translate_tex_format and brw_blorp_surface_info::set
a bit more similar.
No Piglit changes on Sandybridge or Ivybridge. No oglconform changes on
Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emitting flushes before depth and hiz resolves at the top of blorp's
state emission fixes the hang. Marchesin and I found the fix
experimentally, as opposed to adhering to a documented hardware
workaround. A more minimal fix likely exists, but this gets the job
done.
Fixes HiZ hangs in the new WebGL Google maps on Sandybridge Chrome OS.
Tested by zooming in and out continuously for 2 hours.
This patch is based on
8bc07bb701
CC: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70740
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell allows us to specify an arbitrary value for QPitch, rather
than baking a specific formula into the hardware and requiring software
to lay things out to match. The only restriction is that the software
provided QPitch needs to be large enough so successive array slices do
not overlap.
In order to support this flexibility, software needs to specify QPitch
in a bunch of packets. Storing QPitch makes that easy, and allows us to
adjust it in a single place should we wish to change it in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell introduces support for Q, UQ, and HF types. It also extends
DF support to allow immediate values.
Irritatingly, although HF and DF both support immediates, they're
represented by a different value depending on the register file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge, Baytrail, and Haswell support double float register types,
but do not support them as immediate values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On released hardware, values 4-6 are overloaded. For normal registers,
they mean UB/B/DF. But for immediates, they mean UV/VF/V.
Previously, we just created #defines for each name, reusing the same
value. This meant we could directly splat the brw_reg::type field into
the assembly encoding, which was fairly nice, and worked well.
Unfortunately, Broadwell makes this infeasible: the HF and DF types are
represented as different numeric values depending on whether the
source register is an immediate or not.
To preserve sanity, I decided to simply convert BRW_REGISTER_TYPE_* to
an abstract enum that has a unique value for each register type, and
write translation functions. One nice benefit is that we can add
assertions about register files and generations.
I've chosen not to convert brw_reg::type to the enum, since converting
it caused a lot of trouble due to C++ enum rules (even though it's
defined in an extern "C" block...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions use a different encoding for register types
(and have a much more limited set to choose from).
Previously, we translated those into BRW_REGISTER_TYPE_* values, then
reused the existing reg_encoding mapping.
Doing it directly is more straightforward and actually less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
UB types have never been supported as immediates. On Gen4-5, register
encoding 4 is "Reserved." On Gen6+, it means UV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Calling the local variables flat_enable and point_sprite_enable is
clearer than dw16 and such. It also matches the names used in
calculate_attr_overrides, which computes them.
v2: Add /* dw16 */ and /* dw10 */ comments, requested by Jordan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
calculate_attr_overrides is responsible for computing the point sprite
and flat-shading enable bitfields. It does so by OR'ing in a bunch of
bits. However, it relied on the caller to set the initial value to
zero. This is pretty fragile - if the caller neglects to zero out those
variables, then the enable bitfields end up full of garbage, which shows
up as random things being flat-shaded.
This patch moves the zero-initialization into calculate_attr_overrides,
so that the computation is completely in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git blame ascribes this to the initial commit of the driver.
No released hardware has ever supported half float, according to the
documentation for SrcType in the ISA reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If no function signature is found for a function name, report that the
function is not found instead of printing an empty list of candidates.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch changes the error reporting behavior for incorrect function
invocation (triggered by match_function_by_name() unable to find a
matching function call) from using the line number information
associated to the function name term to using the line number
information of the entire function expression. Fixes bug #72264.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72264
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
For GLSL 1.50 we can get frag shaders with primitive id as an
input, add support to the translator for this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In load_texunit_bumpmap tc_array is asserted so lets assert
rot_mat_0 and rot_mat_1 also which are coming from same path.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for malloc() returning null to fix Klocwork warnings.
Minor clean-ups by BrianP.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RGB32F, GL_RGB32UI and GL_RGB32I texture buffer formats are
only supposed to be allowed if the GL_ARB_texture_buffer_object_rgb32
extension is supported. Note that the texture buffer extensions
require a core profile. This patch adds those checks.
Fixes the soon-to-be-added
arb_clear_buffer_object-negative-bad-internalformat piglit test.
Add function to test if the buffer is already mapped and if so,
if the mapped range overlaps the given range.
Modify the _mesa_InvalidateBufferSubData function to use
the new function.
Enable buffer_object_subdata_range_good() to use bufferobj_range_mapped
Reviewed-by: Brian Paul <brianp@vmware.com>
alpha, lumincance and intensity formats are illegal in a core context.
Add a check to return MESA_FORMAT_NONE if one of those is requested within
a core context.
Reviewed-by: Brian Paul <brianp@vmware.com>
- change storage class from static to extern
- rename validate_texbuffer_format to _mesa_validate_texbuffer_format
Reviewed-by: Brian Paul <brianp@vmware.com>
- add xml file for extension
- add reference in gl_API.xml
- add pointer to device driver function table (dd.h)
- update dispatch_sanity.cpp
Reviewed-by: Brian Paul <brianp@vmware.com>
Specs say it's legal for implementations to use internal copies, and
the write synchronization seems to work. Fixes clCreateBuffer
(together with previous patches) and buffer-flags piglits.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
v2: Use fewer if statements and functional tricks instead of single-use method,
suggested by Francisco Jerez.
Squash two small patches into one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When LLVM is build with Clang, "llvm-config --cxxflags" contains the
-fcolor-diagnostics flag. It is not recognized by gcc and the build
fails. Fix by removing the flag.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
The dri2 state tracker is checking for driver support before enabling
dri2ImageExtension version 7. This commit adds a check that also the
kernel driver supports fd sharing through prime.
Note that this adds a libdrm dependency on dri2.c.
v2: Removed unnecessary clamping of bool expression
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This will avoid compiler warnings in the patch that follows. There
should be no user-visible effect because the change only affects the
behaviour when an invalid enum is passed to
_mesa_shader_type_to_index(), and that can only happen if there is a
bug elsewhere in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
All this cruft was ported from r600g and isn't needed on SI and later
according to hw docs. If we implemented HiS, we would set it to 0.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Replicate some of the gallium pipe transfer functionality.
Also bump minor to signal availability of this feature.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
* These make up the base of what C++ GL Haiku applications
use for 3D rendering.
* Not placed in includes/GL to prevent Haiku headers from
getting installed on non-Haiku systems.
Acked-by: Brian Paul <brianp@vmware.com>
This fixes MSAA resolving for 32-bit integer colorbuffers, which isn't
implemented by the hardware.
It also fixes VM protection faults when resolving MSAA 2D array textures.
This may be a CB bug, because shader-based resolving works fine.
It may also be faster for upside-down and scaled blits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For scaled resolve. The filter is only good for magnification.
If somebody has an idea how to implement a good filter for minification,
I'm all ears. I'd have to use derivatives probably.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need this for integer formats and upside-down blits, which Radeons don't
support for MSAA resolving.
It can be used by calling util_blitter_blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
The argument is a i8 pointer not a i32 pointer (even though the value actually
stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
leading to crashes.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Didn't really work as well as hoped (in particular it was not generally
more accurate), will solve this differently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This code was always problematic, and with 64bit rasterization we no longer
need it at all.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The target parameter to _mesa_get_tex_image() is a target enum, not an index.
When we're setting up faces for a cubemap, it should be
CUBE_MAP_POSITIVE_X .. CUBE_MAP_NEGATIVE_Z; for all other targets it
should be the same as the texobj's target.
Fixes broken cubemaps [had only +X face but claimed to have all] produced by
glTextureView, which then caused various crashes in the driver when we
tried to use them.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: - add assert so we don't run into trouble on Gen6.
- adjust for Tapani's rearrangement of ir_variable
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL_ARB_shadow spec says the shadow compare mode should have no
effect when sampling a color texture. As it was, it was up to
drivers to check for that (softpipe, llvmpipe, svga and probably
the rest don't do that). Note: it looks like DX10 allows shadow
compare with some non-depth formats, so this case really should be
handled in the state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Depending on the depth texture format, we may or may not have to
emit explicit fs code to do the shadow comparison. Before, we
were emitting it more often than needed.
v2: check the actual texture format rather than the screen->depth.z16
field. The screen->depth.z16, x8z24, s8z24 fields may not all be set
to a consistent set of depth formats.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Call TextureView helper function to set TextureView state
appropriately for the TexStorage calls.
Misc updates from review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add helper function to set texture_view state from TexStorage calls.
Include review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add Mesa TextureView logic.
Incorporate feedback on ARB_texture_view:
- Add S3TC VIEW_CLASSes to compatibility table
- Use existing _mesa_get_tex_image
- Clean up error strings
- Use bool instead of GLboolean for internal functions
- Split compound level & layer test into individual tests
- eliminate helper macro for VIEW_CLASS table
- do not call driver if ptr null.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Refactor to make next_mipmap_level_size defined in mipmap.c a
_mesa_ helper function that can then be used by texture_view
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add support for ARB_texture_view get parameters:
GL_TEXTURE_VIEW_MIN_LEVEL
GL_TEXTURE_VIEW_NUM_LEVELS
GL_TEXTURE_VIEW_MIN_LAYER
GL_TEXTURE_VIEW_NUM_LAYERS
Incorporate feedback regarding when to allow query of
GL_TEXTURE_IMMUTABLE_LEVELS.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add state needed by glTextureView to the gl_texture_object.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Stub in glTextureView API call to go with the
glTextureView API xml definition.
Includes dispatch test for glTextureView
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes the error condition to satisfy below statement
from OpenGL 4.3 core specification:
"An INVALID_OPERATION error is generated if id is the name of a query
object with a target other SAMPLES_PASSED, ANY_SAMPLES_PASSED, or
ANY_SAMPLES_PASSED_CONSERVATIVE, or if id is the name of a query
currently in progress."
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not sure why this change is necessary. When I've built previous tar files
(such as 9.2.4) with the "make tarballs" target, they include the
bin/test-driver file. But at my first attempt to build the tar files for the
10.0.1 release this file was not being included and the build failed.
(cherry picked from commit d573899b93)
[The cherry pick is because I original applied this on the 10.0 branch while
working on the 10.0.1 release. But if we don't have this on master as well,
this issue will trip us up again the next time we make a new major-release
branch off of master.]
The driverPrivate pointer is opaque to the driver and we can't assume
it's a struct gl_context in dri_util.c. Instead provide a helper function
to set the struct gl_context flags from the incoming DRI context flags.
v2 (idr): Modify the other classic drivers to also use
driContextSetFlags. I ran all the piglit GLX_ARB_create_context tests
with i965 and classic swrast without regressions.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1 on Gallium nouveau]
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patches add MESA_copy_sub_buffer support to the dri sw loader and
then to gallium state tracker, llvmpipe, softpipe and other bits.
It reuses the dri1 driver extension interface, and it updates the swrast
loader interface for a new putimage which can take a stride.
I've tested this with gnome-shell with a cogl hacked to reenable sub copies
for llvmpipe and the one piglit test.
I could probably split this patch up as well.
v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review,
add to p_screen doc comments.
v3: finish off winsys interfaces, add swrast classic support as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
swrast: add support for copy_sub_buffer
Required for glClearBuffer, which only clears one colorbuffer attachment.
Example:
If the first colorbuffer is float and the second one is int:
pipe->clear(pipe, PIPE_CLEAR_COLOR0, float_clear_color, ...);
pipe->clear(pipe, PIPE_CLEAR_COLOR1, int_clear_color, ...);
This doesn't need any driver changes yet, because all drivers just use:
if (flags & PIPE_CLEAR_COLOR) ..
The drivers which support GL 3.0 will have to implement it properly though.
This adds 2 optimizations for radeonsi:
- handling of DISCARD_RANGE
- mapping an uninitialized buffer range is automatically UNSYNCHRONIZED
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
v2: Renamed r600_buffer.c to r600_buffer_common.c. The stupid build system
doesn't allow 2 files of the same name in different directories.
If we assume that all buffers allocated by the DDX are scanout, a new flag
that says "this is not scanout" has to be added to support the non-scanout
buffers and maintain backward compatibility.
This fixes bad rendering on Wayland.
The flag is defined as:
#define RADEON_TILING_R600_NO_SCANOUT RADEON_TILING_SWAP_16BIT
AFAIK, RADEON_TILING_SWAP_16BIT is not used on SI.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Patch copies the whole data structure at once instead of
assigning individual variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields and variables to the data
structure:
explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields in to the data structure:
used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Data section helps serialization and cloning of a ir_variable. This
patch includes the helper bits used for read only ir_variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Newer virtual HW versions support smooth/stipple/wide lines.
Use that instead of 'draw' fallbacks when possible.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On evergreen we have to reserve 1 stack element in some additional cases
besides the ones mentioned in the docs, but stack size computation was
recently reimplemented exactly as described in the docs by the patch that
added workarounds for stack issues on EG/CM, resulting in regressions
with some apps (Serious Sam 3).
This patch fixes it by restoring previous behavior.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Caching in the vbuf module meant that once a vertex has been
emitted it was cached, but it's possible for a vertex at the
same location to be emitted again, but this time with a different
front-face semantic. Caching was causing the first version of the
vertex to be emitted, which resulted in the renderer getting
incorrect front-face attributes. By reseting the vertex_id (which
is used for caching) we make sure that once a front-face info
has been injected the vertex will endup getting emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
v2: Fix up queryImage return for ATTRIB_FD
Use driver_descriptor.configuration to determine whether the driver
supports DMA-BUF import/export.
v3: Really, truly, fix up queryImage return for ATTRIB_FD
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
First off, nv50_program only has 16 in/out varyings. However reporting
16 makes 'm' become 68 in nv50_fp_linkage_validate with the
varying-packing-simple piglit test. (Subverting the assert makes it
compile but fail.) With this patch, varying-packing-simple passes.
See: https://bugs.freedesktop.org/show_bug.cgi?id=69155
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
To help the transition period when DRI loaders are being updated
to support the newer __driDriverExtensions_foo mechanism,
we populate __driDriverExtensions with the extensions returned
by __driDriverExtensions_foo during a library contructor
function.
We find the driver foo's name by using the dladdr function
which gives the path of the dynamic library's name that
was being loaded.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
The _EGLSurface struct which is embedded into dri2_egl_surface also contains a
swap interval member so the other member is redundant. Nothing was using it as
far as I can tell.
On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently
ignores the fenced flag:
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
need_fence = false;
Thanks to Eric for noticing this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that loop_controls no longer creates normatively bound loops,
there is no need for ir_loop::normative_bound or the
lower_bounded_loops pass.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when loop_controls analyzed a loop and found that it had a
fixed bound (known at compile time), it would remove all of the loop
terminators and instead set the loop's normative_bound field to force
the loop to execute the correct number of times.
This made loop unrolling easy, but it had a serious disadvantage.
Since most GPU's don't have a native mechanism for executing a loop a
fixed number of times, in order to implement the normative bound, the
back-ends would have to synthesize a new loop induction variable. As
a result, many loops wound up having two induction variables instead
of one. This caused extra register pressure and unnecessary
instructions.
This patch modifies loop_controls so that it doesn't set the loop's
normative_bound anymore. Instead it leaves one of the terminators in
the loop (the limiting terminator), so the back-end doesn't have to go
to any extra work to ensure the loop terminates at the right time.
This complicates loop unrolling slightly: when deciding whether a loop
can be unrolled, we have to account for the presence of the limiting
terminator. And when we do unroll the loop, we have to remove the
limiting terminator first.
For an example of how this results in more efficient back end code,
consider the loop:
for (int i = 0; i < 100; i++) {
total += i;
}
Previous to this patch, on i965, this loop would compile down to this
(vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
(notice that both g8 and g4 are loop induction variables; one is used
to terminate the loop, and the other is used to accumulate the total).
After this patch, the same loop compiles to:
mov(8) g4<1>.xD 0D
loop:
cmp.ge.f0(8) null g4<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This value is now redundant with
loop_variable_state::limiting_terminator->iterations and
ir_loop::normative_bound.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The old logic of loop_unroll_visitor::visit_leave(ir_loop *) was:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
else if (loop contains one jump) {
if (the jump is an unconditional "break" at the end of the loop) {
remove the break and set iteration count to 1;
fall through to simple loop unrolling code;
} else {
for (each "if" statement in the loop body)
see if the jump is a "break" at the end of one of its forks;
if (the "break" wasn't found)
return;
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unrolling code;
return;
}
}
simple loop unrolling code;
return;
These tasks have been moved to their own functions:
- splice the remainder of the loop into the other fork of the "if"
- simple loop unrolling code
- complex loop unrolling code
And the logic has been flattened to:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
if (loop contains no jumps) {
simple loop unroll;
return;
}
if (the jump is an unconditional "break" at the end of the loop) {
remove the break;
simple loop unroll with iteration count of 1;
return;
}
for (each "if" statement in the loop body) {
if (the jump is a "break" at the end of one of its forks) {
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unroll;
return;
}
}
This will make it easier to modify the loop unrolling algorithm in a
future patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the sole responsibility of loop_analysis was to find all
the variables referenced in the loop that are either loop constant or
induction variables, and find all of the simple if statements that
might terminate the loop. The remainder of the analysis necessary to
determine how many times a loop executed was performed by
loop_controls.
This patch makes loop_analysis also responsible for determining the
number of iterations after which each loop terminator will terminate
the loop, and for figuring out which terminator will terminate the
loop first (I'm calling this the "limiting terminator").
This will allow loop unrolling to make use of information that was
previously only visible from loop_controls, namely the identity of the
limiting terminator.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patches to follow will introduce code into the loop_terminator
constructor. Allocating loop_terminator using new(mem_ctx) syntax
will ensure that the constructor runs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When loop_control_visitor::visit_leave(ir_loop *) is analyzing a loop
terminator that acts on a certain ir_variable, it doesn't need to walk
the list of induction variables to find the loop_variable entry
corresponding to the variable. It can just look it up in the
loop_variable_state hashtable and verify that the loop_variable entry
represents an induction variable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These fields were part of some planned optimizations that never
materialized. Remove them for now to simplify things; if we ever get
round to adding the optimizations that would require them, we can
always re-introduce them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the
i965 fs and vec4 visitors) had nearly identical logic for handling
bounded loops. This replaces the duplicate logic with an equivalent
lowering pass that is used by all the back-ends.
Note: on i965, there is a slight increase in instruction count. For
example, a loop like this:
for (int i = 0; i < 100; i++) {
total += i;
}
would previously compile down to this (vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) break(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
After this patch, the "(+f0) break(8)" turns into:
(+f0) if(8)
break(8)
endif(8)
because the back-end isn't smart enough to recognize that "if
(condition) break;" can be done using a conditional break instruction.
However, it should be relatively easy for a future peephole
optimization to properly optimize this.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, loop analysis would set
this->conditional_or_nested_assignment based on the most recently
visited assignment to the variable. As a result, if a vaiable was
assigned to more than once in a loop, the flag might be set
incorrectly. For example, in a loop like this:
int x;
for (int i = 0; i < 3; i++) {
if (i == 0)
x = 10;
...
x = 20;
...
}
loop analysis would have incorrectly concluded that all assignments to
x were unconditional.
In practice this was a benign bug, because
conditional_or_nested_assignment is only used to disqualify variables
from being considered as loop induction variables or loop constant
variables, and having multiple assignments also disqualifies a
variable from being considered as either of those things.
Still, we should get the analysis correct to avoid future confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting an ir_call, loop analysis would only mark
the innermost enclosing loop as containing a call. As a result, when
encountering a loop like this:
for (i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
foo();
}
}
it would incorrectly conclude that the outer loop ran three times.
(This is not certain; if foo() modifies i, then the outer loop might
run more or fewer times).
Fixes piglit test "vs-call-in-nested-loop.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting a variable dereference, loop analysis would
only consider its effect on the innermost enclosing loop. As a
result, when encountering a loop like this:
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
...
i = 2;
}
}
it would incorrectly conclude that the outer loop ran three times.
Fixes piglit test "vs-inner-loop-modifies-outer-loop-var.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fast color clears of MSAA buffers work just like fast color clears
with non-MSAA buffers, except that the alignment and scaledown
requirements are different.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will make it easier to add fast color clear support to MSAA
buffers, since they have different alignment and scaling requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we didn't do multisample blorp clears because we couldn't
figure out how to get them to work. The reason for this was because
we weren't setting the brw_blorp_params num_samples field consistently
with dst.num_samples. Now that those two fields have been collapsed
down into one, we can do multisample blorp clears.
However, we need to do a few other pieces of bookkeeping to make them
work correctly in all circumstances:
- Since blorp clears may now operate on multisampled window system
framebuffers, they need to call
intel_renderbuffer_set_needs_downsample() to ensure that a
downsample happens before buffer swap (or glReadPixels()).
- When clearing a layered multisample buffer attachment using UMS or
CMS layout, we need to advance layer by multiples of num_samples
(since each logical layer is associated with num_samples physical
layers).
Note: we still don't do multisample fast color clears; more work needs
to be done to enable those.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch renames the enum that's used to keep track of fast clear
state from "mcs_state" to "fast_clear_state", and it removes the enum
value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA
buffer, so we're not keeping track of fast clear state"). The only
real purpose that enum value was serving was to prevent us from trying
to do fast clear resolves on MSAA buffers, and it's just as easy to
prevent that by checking the buffer's msaa_layout.
This paves the way for implementing fast clears of MSAA buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "layer" parameters used in blorp, and the
intel_renderbuffer::mt_layer field, represent a physical layer rather
than a logical layer. This is important for 2D multisample arrays on
Gen7+ because the UMS and CMS multisample layouts use N physical
layers to represent each logical layer, where N is the number of
samples.
Also add an assertion to blorp to help catch bugs if we fail to follow
these conventions.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Clarify the fact that we only optimize full buffer clears using fast
color clear, and why.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create the ref_bo without any storage type flags set for now. The issue
probably arises from our use of the additional buffer space at the end
of the ref_bo. It should probably be split up in the future.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Martin Peres <martin.peres@labri.fr>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.
lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.
lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The Wayland EGL platform now respects the eglSwapInterval value. The value is
clamped to either 0 or 1 because it is difficult (and probably not useful) to
sync to more than 1 redraw.
The main change is that if the swap interval is 0 then Mesa won't install a
frame callback so that eglSwapBuffers can be executed as often as necessary.
Instead it will do a sync request after the swap buffers. It will block for
sync complete event in get_back_bo instead of the frame callback. The
compositor is likely to send a release event while processing the new buffer
attach and this makes sure we will receive that before deciding whether to
allocate a new buffer.
If there are no buffers available then instead of returning with an error,
get_back_bo will now poll the compositor by repeatedly sending sync requests
every 10ms. This is a last resort and in theory this shouldn't happen because
there should be no reason for the compositor to hold on to more than three
buffers. That means whenever we attach the fourth buffer we should always get
an immediate release event which should come in with the notification for the
first sync request that we are throttled to.
When the compositor is directly scanning out from the application's buffer it
may end up holding on to three buffers. These are the one that is is currently
scanning out from, one that has been given to DRM as the next buffer to flip
to, and one that has been attached and will be given to DRM as soon as the
previous flip completes. When we attach a fourth buffer to the compositor it
should replace that third buffer so we should get a release event immediately
after that. This patch therefore also changes the number of buffer slots to 4
so that we can accomodate that situation.
If DRM eventually gets a way to cancel a pending page flip then the compositors
can be changed to only need to hold on to two buffers and this value can be
put back to 3.
This also moves the vblank configuration defines from platform_x11.c to the
common egl_dri2.h header so they can be shared by both platforms.
Consider a typical game-style main loop which might be like this:
while (1) {
draw_something();
eglSwapBuffers();
}
In this case the game is relying on eglSwapBuffers to throttle to a sensible
frame rate. Previously this game would end up using three buffers even though
it should only need two. This is because Mesa decides whether to allocate a
new buffer in get_back_bo which would be before it has tried to read any
events from the compositor so it wouldn't have seen any buffer release events
yet.
This patch just moves the block for the frame callback to get_back_bo.
Typically the compositor will send a release event immediately after one of
the attaches so if we block for the frame callback here then we can be sure to
have completed at least one roundtrip and received that release event after
attaching the previous buffer before deciding whether to allocate a new one.
dri2_swap_buffers always calls get_back_bo so even if the client doesn't
render anything we will still be sure to block to the frame callback. The code
to create the new frame callback has been moved to after this call so that we
can be sure to have cleared the previous frame callback before requesting a
new one.
This patch fixes this MinGW build error.
CC glapi_gentable.lo
glapi_gentable.c:47:19: fatal error: dlfcn.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drivers will need to look at this to decide if they need to do
per-sample fragment shader dispatch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.
V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)
This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bspec says:
"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"
I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For some reason this was left out when the version was changed...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:
(+f0) IF
MOV dst0 ...
MOV dst1 ...
MOV dst2 ...
ELSE
MOV dst0 ...
MOV dst1 ...
MOV dst3 ...
ENDIF
This commit extends the peephole to handle this case.
No shader-db changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).
This peephole fixes both of these and operates as a normal optimization
pass.
fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.
total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs: 167120 -> 153824 (-7.96%)
GAINED: 13
LOST: 6
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the last thing that register_coalesce() still handled.
total instructions in shared programs: 1561060 -> 1560908 (-0.01%)
instructions in affected programs: 15758 -> 15606 (-0.96%)
Reviewed-by: Eric Anholt <eric@anholt.net>
parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.
Reviewed-by: Eric Anholt <eric@anholt.net>
make_list is just a one-line wrapper and was confusingly called by
NULL objects. E.g., cur_if == NULL; cur_if->make_list(mem_ctx).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we made the basic block following an ENDIF instruction a
successor of the basic blocks ending with IF and ELSE. The PRM says that
IF and ELSE instructions jump *to* the ENDIF, rather than over it.
This should be immaterial to dataflow analysis, except for if, break,
endif sequences:
START B1 <-B0 <-B9
0x00000100: cmp.g.f0(8) null g15<8,8,1>F g4<0,1,0>F
0x00000110: (+f0) if(8) 0 0 null 0x00000000UD
END B1 ->B2 ->B4
START B2 <-B1
break
0x00000120: break(8) 0 0 null 0D
END B2 ->B10
START B3
0x00000130: endif(8) 2 null 0x00000002UD
END B3 ->B4
The ENDIF block would have no parents, so dataflow analysis would
generate incorrect results, preventing copy propagation from eliminating
some instructions.
This patch changes the CFG to make ENDIF start rather than end basic
blocks, so that it can be the jump target of the IF and ELSE
instructions.
It helps three programs (including two fs8/fs16 pairs).
total instructions in shared programs: 1561126 -> 1561060 (-0.00%)
instructions in affected programs: 837 -> 771 (-7.89%)
More importantly, it allows copy propagation to handle more cases.
Disabling the register_coalesce() pass before this patch hurts 58
programs, while afterward it only hurts 11 programs.
Reviewed-by: Eric Anholt <eric@anholt.net>
This extension enabled the use of texture array with fixed-function and
assembly fragment shaders. No applications are known to use this
extension.
NOTE: This patch regresses GL_TEXTURE_1D_ARRAY and GL_TEXTURE_2D_ARRAY
cases of the copyteximage piglit test. The test is incorrectly using
texture arrays with fixed function while only requiring the
GL_EXT_texture_array extension. A fix for the test has been posted to
the piglit mailing list.
http://lists.freedesktop.org/archives/piglit/2013-November/008639.html
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver that enables one also enables the other. The difference
between the two is MESA adds support for fixed-function and assembly
fragment shaders, but EXT only adds support for GLSL. The MESA
extension was created back when Mesa did not support GLSL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function 'count_tex_size':
main/texobj.c:886:23: warning: unused parameter 'key' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function '_mesa_test_texobj_completeness':
main/texobj.c:553:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:193: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:254: warning: signed and unsigned type in conditional expression [-Wsign-compare]
main/texobj.c:553:148: warning: signed and unsigned type in conditional expression [-Wsign-compare]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
That enum requires GL_ARB_texture_cube_map_array, and it is only
available on desktop GL. It looks like this has been an un-noticed
issue since GL_ARB_texture_cube_map_array support was added in commit
e0e7e295.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds an extension called EGL_WL_create_wayland_buffer_from_image
which adds the following single function:
struct wl_buffer *
eglCreateWaylandBufferFromImageWL(EGLDisplay dpy, EGLImageKHR image);
The function creates a wl_buffer which shares its contents with the given
EGLImage. The expected use case for this is in a nested Wayland compositor
which is using subsurfaces to present buffers from its clients. Using this
extension it can attach the client buffers directly to the subsurface without
having to blit the contents into an intermediate buffer. The compositing can
then be done in the parent compositor.
The extension is only implemented in the Wayland EGL platform because of
course it wouldn't make sense anywhere else.
If we're not using EGL_EXT_swap_buffers_with_damage, we have to
damage the full extent. EGL operates on buffer coordinates, but
wl_surface.damage takes surface coordinates. EGL doesn't know the
buffer transformation (rotated or scaled) and can't post accurate
damage in surface coordinates. The damage event however is clipped to
the surface extents so we can just damage the maximum rectangle.
In case of EGL_EXT_swap_buffers_with_damage, the application knows
the buffer transform and is expected to pass in rectangles in
surface space.
https://bugs.freedesktop.org/show_bug.cgi?id=70250
Cc: "10.0" mesa-stable@lists.freedesktop.org
flush_with_flags, when available, allows the driver to throttle.
Using this suppress input lag issues that can be observed in heavy
rendering situations on non-intel cards.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
This typically won't make a difference, since we only send the requests at
wl_display_flush() time. There might be a small race
with another thread calling wl_display_flush() after our commit request,
but before we flush the DRI driver. Moving the commit below the DRI
driver flush call looks more natural and eliminates the small race.
Cc: "10.0" mesa-stable@lists.freedesktop.org
We would like the compositor to receive the commited buffer
as soon as possible, so it has the time to treat it, and
release old ones. We shouldn't rely on the client
to flush the queue for us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Display lists allocate memory in chunks of 256 tokens (1KB) at a time.
If an app creates many short display lists or uses glXUseXFont() this
can waste quite a bit of memory.
This patch uses realloc() to trim short lists and reduce the memory
used.
Also, null/zero-out some list construction fields in _mesa_EndList().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now, sizeof(gl_dlist_node)==4 even on 64-bit systems. This can
halve the memory used by some display lists on 64-bit systems.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a first step in reducing memory used by display lists on
64-bit systems. On 64-bit systems, the gl_dlist_node union type
is 8 bytes because of the 'data' and 'next' fields. This causes
every display list node/token to occupy 8 bytes instead of 4 as
originally designed. This basically doubles the memory used by
some display lists on 64-bit systems.
The fix is to remove the 64-bit 'data' and 'next' pointer fields
from the union and instead store them as a pair of 32-bit values.
Easily done with a few helper functions.
The next patch will take care of the 'next' field.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes a memory leak in some situations. Also avoids emitting an
extra fence if the kick handler does the call to nouveau_fence_next
itself.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
So that it acts like ordinary free(). This lets us remove a bunch of
if statements where the function is called.
v2:
- Avoiding compile error on MSVC and possible warnings on other compilers.
- Added comment regards passing NULL pointer being safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
I missed this in the boolean -> enum conversion. C cheerfully casts
false -> 0 -> UNKNOWN_RING. On Gen4-5, this causes the render ring
prelude hook to get called in the middle of the batch, which is crazy.
BRW_BATCH_STRUCT is not used on Gen6+.
Fixes regressions since 395a32717d
("i965: Introduce an UNKNOWN_RING state.").
Fixes "fips -v glxgears" on Ironlake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit a4bf7f6b6e.
It breaks occlusion queries on Gen4-5. Doing this right will likely
require larger changes, which should be done at a future date.
Some Piglit tests still passed due to other bugs; fixing those revealed
this problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I guarded half of the callers to start/stop_oa_counters with generation
checks, but missed the other half (which were added later). OACONTROL
doesn't exist on Ironlake, so we better not write it. Also, there's no
need---Ironlake's performance counters are always running.
This patch moves the generation checks into start/stop_oa_counters,
rather than requiring the caller to do them.
Fixes assertion failures in Piglit's AMD_performance_monitor/measure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
One can select if they want to fallback to softpipe.
Current approach makes this not possible, whereas other
targets (dri-swrast) handle this approapriately.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The BSpec states that the aligment for the non-msrt clear rectangle must
be doubled; the BSpec does not restricit the workaround to specific
hardware.
Commit 9a1a67b applied the workaround to Haswell GT3. Commit 8b659ce
expanded the workaround to all Haswell variants. This commit expands it
to all hardware.
No Piglit regressions on Ivybridge 0x0166. No fixes either.
I know no Ivybridge nor Baytrail bug related to this workaround.
However, the BSpec says the extra alignment is required, so let's do it.
v2: Apply to all hardware, not just gen7.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
All bound layers (from first_layer to last_layer) should be cleared.
This uses a vertex shader which outputs gl_Layer = gl_InstanceID, so each
instance goes to a different layer. By rendering a quad and setting
the instance count to the number of layers, it will trivially clear all
layers.
This requires AMD_vertex_shader_layer (or PIPE_CAP_TGSI_VS_LAYER), which only
radeonsi supports at the moment. r600 could do this too. Standard DX11
hardware will have to use a geometry shader though, which has higher overhead.
This is a subset of geometry shaders. It's all about setting first_layer and
last_layer correctly.
Also some code between st_render_texture and update_framebuffer_state is
consolidated. It doesn't use rtt_level and derives the level from dimensions
instead as the code in st_atom_framebuffer.c did.
Commit a594cec broke EGL X11 backend by adding dependency between
X11 and DRM backends requiring HAVE_EGL_PLATFORM_DRM defined for X11.
This patch fixes the issue by adding additional define for libdrm
detection independent of which backend is being compiled. Tested by
compiling Mesa with '--with-egl-platforms=x11' and running es2gears_x11
+ glbenchmark2.7 successfully.
v2: return true for dri2_auth if running without libdrm (Samuel)
v3: check libdrm when building EGL drm platform + AM_CFLAGS fix (Emil)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72062
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: mesa-stable@lists.freedesktop.org
MI_STORE_REGISTER_MEM has to take a 48-bit address, so the existing code
doesn't work. But supposedly Broadwell has a register whitelist and
just works out of the box anyway, so there's no need to check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Gen7 sampler state code still works. Increasing the alignment to
64 bytes makes bit 5 zero, which is good because it's now reserved.
Since we don't use the new filter bits, we can leave those as zero too,
which means we don't need to update the code to update the pointer.
(We probably should anyway, for clarity, but alas, another day.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation is really hard to follow, but apparently a 32-bit x
32-bit multiply just works without the MACH macro. The macro apparently
is only necessary to get the full 64-bit value.
Fixes Piglit tests [vf]s-op-mult-int-int.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like Haswell, we do this in SURFACE_STATE rather than shader
workarounds.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell doesn't support a surface vertical alignment of 2. It only
supports VALIGN_4, VALIGN_8, or VALIGN_16. I chose 4 since it's the
least wasteful.
v2: Replace my comment with a better one from Eric. Move Broadwell
checks earlier so it's more obvious that "return 2" won't be hit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to SEND from a GRF, and we can only obtain those prior to
register allocation.
This allows us to do pull constant loads without the MRF hack.
v2: Reword comments (suggested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
SEND can't deal with swizzles, source modifiers, and so on. This should
avoid problems with VS pull constant loads on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Pre-patch, the workaround was applied to only HSW GT3. However, the
workaround also fixes render corruption on the HSW GT1 Chromebook,
codenamed Falco.
Also, update the BSpec quote that discusses the workaround to reflect
the latest BSpec.
The BSpec states that the workaround is required for Ivybridge and
Baytrail as well as Haswell. But, we apply the workaround to only
Haswell because (a) we suspect that is the only hardware where it is
actually required and (b) we haven't yet validated the workaround for
the other hardware.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
OTC-Tracker: CHRMOS-812
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we stored an array of up to 16 additional shaders to link,
as well as a count of how many each shader actually needed.
Since the built-in functions rewrite, all the built-ins are stored in a
single shader. So all we need is a boolean indicating whether a shader
needs to link against built-ins or not.
During linking, we can avoid creating the temporary array if none of the
shaders being linked need built-ins. Otherwise, it's simply a copy of
the array that has one additional element. This is much simpler.
This patch saves approximately 128 bytes of memory per gl_shader object.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since the built-in functions rewrite, num_builtins_to_link is always either
0 or 1, so we don't need tho crazy loop starting at -1 with a special
case.
All we need to do is print the prototypes from the current shader, and
the single built-in function shader (if present).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when we hit a "no matching function" error, it looked like:
0:0(0): error: no matching function for call to `cos()'
0:0(0): error: candidates are: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
Now it looks like:
0:0(0): error: no matching function for call to `cos()'; candidates are:
0:0(0): error: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
This is not really any worse and removes the need for the prefix variable.
It will also help with the next commit's refactoring.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no need to loop through the "parameters" list and remove every
element; move_nodes_to(¶meters) already throws away all elements of
the destination list.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
make[5]: Entering directory `/jhbuild/build/mesa/mesa/src/mapi/glapi/tests'
CXX check_table.o
/jhbuild/checkout/mesa/mesa/src/mapi/glapi/tests/check_table.cpp:29:30: fatal error: glapi/glapitable.h: No such file or directory
We should look for the generated file glapi/glapitable.h in builddir, not srcdir
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
On gen6, multisamble resolve blits use the SAMPLE message to blend
together the 4 samples for each texel. For some reason, SAMPLE
doesn't blend together the proper samples when the source format is
L32_FLOAT or I32_FLOAT, resulting in blocky artifacts.
To work around this problem, sample from the source surface using
R32_FLOAT. This shouldn't affect rendering correctness, because when
doing these resolve blits, the destination format is R32_FLOAT, so the
channel replication done by L32_FLOAT and I32_FLOAT is unnecessary.
Fixes piglit tests on Sandy Bridge:
- spec/ARB_texture_float/multisample-formats 2 GL_ARB_texture_float
- spec/ARB_texture_float/multisample-formats 4 GL_ARB_texture_float
No piglit regressions on Sandy Bridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70601
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) have been assuming this
for some time. Thanks to the preceding patch, the compiler front-end
no longer breaks this assumption.
This patch adds code to validate the assumption so that if we have
future bugs, we'll be able to catch them earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) assume that when
ir_loop::counter is non-null, it points to a fresh ir_variable that
should be used as the loop counter (as opposed to an ir_variable that
exists elsewhere in the instruction stream).
However, previous to this patch:
(1) loop_control_visitor did not create a new variable for
ir_loop::counter; instead it re-used the existing ir_variable.
This caused the loop counter to be double-incremented (once
explicitly by the body of the loop, and once implicitly by
ir_loop::increment).
(2) ir_clone did not clone ir_loop::counter properly, resulting in the
cloned ir_loop pointing to the source ir_loop's counter.
(3) ir_hierarchical_visitor did not visit ir_loop::counter, resulting
in the ir_variable being missed by reparenting.
Additionally, most optimization passes (e.g. loop unrolling) assume
that the variable mentioned by ir_loop::counter is not accessed in the
body of the loop (an assumption which (1) violates).
The combination of these factors caused a perfect storm in which the
code worked properly nearly all of the time: for loops that got
unrolled, (1) would introduce a double-increment, but loop unrolling
would fail to notice it (since it assumes that ir_loop::counter is not
accessed in the body of the loop), so it would unroll the loop the
correct number of times. For loops that didn't get unrolled, (1)
would introduce a double-increment, but then later when the IR was
cloned for linking, (2) would prevent the loop counter from being
cloned properly, so it would look to further analysis stages like an
independent variable (and hence the double-increment would stop
occurring). At the end of linking, (3) would prevent the loop counter
from being reparented, so it would still belong to the shader object
rather than the linked program object. Provided that the client
program didn't delete the shader object, the memory would never get
reclaimed, and so the shader would function properly.
However, for loops that didn't get unrolled, if the client program did
delete the shader object, and the memory belonging to the loop counter
got re-used, this could cause a use-after-free bug, leading to a
crash.
This patch fixes loop_control_visitor, ir_clone, and
ir_hierarchical_visitor to treat ir_loop::counter the same way the
back-ends treat it: as a freshly allocated ir_variable that needs to
be visited and cloned independently of other ir_variables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72026
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If an ir_loop has a non-null "counter" field, the variable referred to
by this field is implicitly read and written by the loop. We need to
account for this in ir_variable_refcount, otherwise there is a danger
we will try to dead-code-eliminate the loop counter variable.
Note: at the moment the dead code elimination bug doesn't occur due to
a bug in ir_hierarchical_visitor: it doesn't visit the "counter"
field, so dead code elimination doesn't treat it as a candidate for
elimination. But the patch to follow will fix that bug, so we need to
fix ir_variable_refcount first in order to avoid breaking dead code
elimination.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The flags value is a bitfield so use the union's 'bf' field, not 'e'
(enum) field. There's no actual change in behavior here since both
fields of the union are the same size.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Trying to compile any of these functions into a display list
now just generates a GL_INVALID_OPERATION error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it is possible to query drivers for the max sampler view it should
be safe to increase this without crashing.
Not entirely convinced this really works correctly though if state trackers
using non-linked sampler / sampler_views use this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support for this to more drivers, in particular for all the "special"
ones useful for debugging.
HW drivers are left alone, some should be able to support it if they want but
they may not be interested at this point.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Only allow __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS in brwCreateContext if
intelInitScreen2 also enabled __DRI2_ROBUSTNESS (thereby enabling
GLX_ARB_create_context).
This fixes a regression in the piglit test
"glx/GLX_ARB_create_context/invalid flag"
v2: Remove commented debug code. Noticed by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reported-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch fixes this build error with Oracle Solaris Studio.
libtool: link: /opt/solarisstudio12.3/bin/cc -g -o glcpp/glcpp glcpp.o prog_hash_table.o ./.libs/libglcpp.a
Undefined first referenced
symbol in file
sqrt prog_hash_table.o
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In brw_update_renderbuffer_surfaces(), if there are no color draw
buffers, we always set up a null render target at surface index 0 so we
have something to use with the FB write marking the end of thread.
However, when we recently began computing surface indexes dynamically,
we failed to reserve space for it. This meant that the first texture
would be assigned surface index 0, and our closing FB write would
clobber the texture.
Fixes Piglit's EXT_packed_depth_stencil/fbo-blit-d24s8 test on Gen4-5,
which regressed as of commit 4e5306453d
("i965/fs: Dynamically set up the WM binding table offsets.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70605
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: lu hua <huax.lu@intel.com>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Now that we have dynamic binding tables there's no good reason anymore
to expose so few atomic counter buffers. Increase it to 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If reparent_ir() is called on invalid IR, then there's a danger that
it will fail to reparent all of the necessary nodes. For example, if
the IR contains an ir_dereference_variable which refers to an
ir_variable that's not in the tree, that ir_variable won't get
reparented, resulting in subtle use-after-free bugs once the
non-reparented nodes are freed. (This is exactly what happened in the
bug fixed by the previous commit).
This patch makes this kind of bug far easier to track down, by
transforming it from a use-after-free bug into an explicit IR
validation error.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 065da16 (glsl: Convert lower_clip_distance_visitor to be an
ir_rvalue_visitor), we failed to notice that since
lower_clip_distance_visitor overrides visit_leave(ir_assignment *),
ir_rvalue_visitor::visit_leave(ir_assignment *) wasn't getting called.
As a result, clip distance dereferences appearing directly on the
right hand side of an assignment (not in a subexpression) weren't
getting properly lowered. This caused an ir_dereference_variable node
to be left in the IR that referred to the old gl_ClipDistance
variable. However, since the lowering pass replaces gl_ClipDistance
with gl_ClipDistanceMESA, this turned into a dangling pointer when the
IR got reparented.
Prior to the introduction of geometry shaders, this bug was unlikely
to arise, because (a) reading from gl_ClipDistance[i] in the fragment
shader was rare, and (b) when it happened, it was likely that it would
either appear in a subexpression, or be hoisted into a subexpression
by tree grafting.
However, in a geometry shader, we're likely to see a statement like
this, which would trigger the bug:
gl_ClipDistance[i] = gl_in[j].gl_ClipDistance[i];
This patch causes
lower_clip_distance_visitor::visit_leave(ir_assignment *) to call the
base class visitor, so that the right hand side of the assignment is
properly lowered.
Fixes piglit test:
- spec/glsl-1.50/execution/geometry/clip-distance-itemized-copy
Cc: Ian Romanick <idr@freedesktop.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit fixes a bug wherein we would incorrectly refer to
stale geometry shader prog_data when no geometry shader was active.
This patch reduces the likelihood of that sort of bug occurring in the
future by setting prog_data to NULL whenever there is no GS program.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, in brw_gs_upload_binding_table(), we checked whether
brw->gs.prog_data was NULL in order to determine whether a geometry
shader was active. This didn't work: brw->gs.prog_data starts off as
NULL, but it is set to non-NULL when a geometry shader program is
built, and then never set to NULL again. As a result, if we called
brw_gs_upload_binding_table() while there was no geometry shader
active, but a geometry shader had previously been active, it would
refer to a stale (and possibly freed) prog_data structure.
This patch fixes the problem by modifying
brw_gs_upload_binding_table() to use the proper technique to determine
whether a geometry shader is active: by checking whether
brw->geometry_program is NULL.
This fixes the crash reported in comment 2 of bug 71870 (the incorrect
rendering remains, however).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than always advertising the extension but failing to create a
context with reset notifiction, just don't advertise it. I don't know
why it didn't occur to me to do it this way in the first place.
NOTE: Kristian requested that I provide a follow-up for master that
dynamically generates the list of DRI extensions instead of selected
between two hardcoded lists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Replace all occurences of the macro with its expansion.
It seems that the macro intended to provide cross-platform static mutex
intialization. However, it had the same definition in all pre-processor
paths:
#define _EGL_DECLARE_MUTEX(m) _EGLMutex m = _EGL_MUTEX_INITIALIZER
Therefore this abstraction obscured rather than helped.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Insert two fields into _egl_global to hold the client extensions and
statically initialize them:
ClientExtensions // a struct of bools
ClientExtensionString
Post-patch, Mesa supports exactly one client extension,
EGL_EXT_client_extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The fast tiled texture upload code does not compile with GCC 4.8's -Og
optimization flag.
memcpy() has the always_inline attribute set. This poses a problem,
since {x,y}tile_copy_faster calls it indirectly via {x,y}tile_copy,
and {x,y}tile_copy normally aren't inlined at -Og.
Using __attribute__((flatten)) tells GCC to inline every function call
inside the function, which I believe was the author's intent.
Fix suggested by Alexander Monakov.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
.. and mark them off on the extensions list as done.
V2: Enable only if pipelined register writes work.
V3: Also update relnotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just prior to emitting the 3DPRIMITIVE command, we load each of the
indirect registers. The values loaded are either from offsets into the
current indirect BO, or constant zero if the parameter is not used for
this draw.
Enabling use of the indirect registers is done by turning on a bit in
the first dword of the 3DPRIMITIVE command itself.
V3: - Deduplicate the common part of both indexed and nonindexed indirect
setup.
- Just refer to the indirect bo out of the context directly.
V4: - Fix bo reference to specify the range we care about.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- MMIO registers for draw parameters
- New bit in 3DPRIMITIVE command to enable indirection
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
V3: - Disallow primcount==0 for DrawMulti*Indirect. The spec is unclear
on this, but it's silly. We might go back on this later if it
turns out to be a problem.
- Make it clear that the caller has dealt with stride==0
V4: - Allow primcount==0 again.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We will reuse the same extension flag for ARB_multi_draw_indirect since
it can always be supported by looping.
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Split from patch implementing ARB_draw_indirect.
v2: Const-qualify the struct gl_buffer_object *indirect argument.
v3: Fix up some more draw calls for new argument.
v4: Fix up rebase conflicts in i965.
v5: Undo const-qualification
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2: drop description of `fall` and `wm`, which have been removed by the
previous patch; describe `stats`.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DEBUG_IOCTL comes from i965, and is about to be removed. Both defines
have the same value (4).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Limit the max glsl version level to what the state tracker supports.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
V2: Return after error to avoid cascading error messages and
removed redundant "to" from error message
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I recently made us try two different things that tried to reduce register
pressure so that we would be more likely to allocate successfully. But
now that we have the logic for trying two, we can make the first thing we
try be the normal, not-prioritizing-register-pressure heuristic.
This means one less scheduling pass in the common case of that heuristic
not producing spills, plus the best schedule we know how to produce, if
that one happens to succeed. This is important, because our register
allocation produces a lot of possibly avoidable dependencies for the
post-register-allocation schedule, despite ra_set_allocate_round_robin().
GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31)
nexuiz: No difference (n=5)
lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86)
minecraft apitrace: No difference (n=15)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The canary is basically just to give a better debugging message when you
ralloc_free() something that wasn't rallocated. Reduces maximum memory
usage of apitrace replay of the dota2 demo by 60MB on my 64-bit system (so
half that on a real 32-bit dota2 environment).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think I was thinking of the batch command packet cache when I pasted
this in, but this counter is only used for dumping out streamed state for
INTEL_DEBUG=batch and for putting annotations in our aub files.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 2f89662 added the driconf option 'clamp_max_samples'. In that
commit, the option did not alter the context version. The neglect to
alter the context version is a fatal issue for some apps.
For example, consider running Chromium with clamp_max_samples=0.
Pre-patch, Mesa creates a GL 3.0 context but clamps GL_MAX_SAMPLES to
0. This violates the GL 3.0 spec, which requires GL_MAX_SAMPLES >= 4.
The spec violation causes WebGL context creation to fail in many
scenarios because Chromium correctly assumes that a GL 3.0 context
supports at least 4 samples.
Since the driconf option was introduced largely for Chromium, the issue
really needs fixing.
This patch fixes calculation of the context version to respect the
post-clamped value of GL_MAX_SAMPLES. This in turn fixes WebGL on
Chromium when clamp_max_samples=0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
clamp_max_samples() and intel_quantize_num_samples() each maintained
their own list of which MSAA modes the hardware supports. This patch
removes the duplication by making intel_quantize_num_samples() use the
same list as clamp_max_samples(), the list maintained in
brw_supported_msaa_modes().
By removing the duplication, we prevent the scenario where someone
updates one list but forgets to update the other.
Move function `brw_context.c:static brw_supported_msaa_modes()` to
`intel_screen.c:(non-static) intel_supported_msaa_modes()` and patch
intel_quantize_num_samples() to use the list returned by that function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This simplifies the loop logic in a subsqequent patch that refactors
intel_quantize_num_samples() to use brw_supported_msaa_modes().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These degenerate instructions can often be emitted by state trackers
when the semantics of instructions don't match precisely.
Reviewed-by: Brian Paul <brianp@vmware.com>
Several of links the were contributed by Keith Whitwell and Roland Scheidegger.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
From section 6.1.18 (Renderbuffer Object Queries) of the GL 3.2 spec,
under the heading "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE
is TEXTURE, then":
If pname is FRAMEBUFFER_ATTACHMENT_LAYERED, then params will
contain TRUE if an entire level of a three-dimesional texture,
cube map texture, or one-or two-dimensional array texture is
attached. Otherwise, params will contain FALSE.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/framebuffer-layered-attachments
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture-defaults
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Don't include "EXT" in the error message, since this query only
makes sensen in context versions that have adopted
glGetFramebufferAttachmentParameteriv().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using the code path for validating
glFramebufferTextureLayer(). But glFramebufferTexture() allows
additional texture types.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/gl-layer-cube-map
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Clarify comment above framebuffer_texture().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the fast depth clear path.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-depth".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the blorp clear path for color buffers.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-color".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes meta clears to properly clear all layers of a layered
framebuffer attachment. We accomplish this by adding a geometry
shader to the meta clear program which sets gl_Layer to a uniform
value. When clearing a layered framebuffer, we execute in a loop,
setting the uniform to point to each layer in turn.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to properly clear layered framebuffers, we need to know how
many layers they have. The easiest way to do this is to record it in
the gl_framebuffer struct when we check framebuffer completeness.
This patch replaces the gl_framebuffer::Layered boolean with a
gl_framebuffer::NumLayers integer, which is 0 if the framebuffer is
not layered, and equal to the number of layers otherwise.
v2: Remove gl_framebuffer::Layered and make gl_framebuffer::NumLayers
always have a defined value. Fix factor of 6 error in the number of
layers in a cube map array.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevents a GPU page fault if somehow the uniform bo gets evicted
before the screen_create pushbuf has been submitted.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Previously, we checked for interstage uniform interface block link
errors in validate_interstage_interface_blocks(), which is only called
on pairs of adjacent shader stages. Therefore, we failed to detect
uniform interface block mismatches between non-adjacent shader stages.
Before the introduction of geometry shaders, this wasn't a problem,
because the only supported shader stages were vertex and fragment
shaders, therefore they were always adjacent. However, now that we
allow a program to contain vertex, geometry, and fragment shaders,
that is no longer the case.
Fixes piglit test "skip-stage-uniform-block-array-size-mismatch".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Rename validate_interstage_interface_blocks() to
validate_interstage_inout_blocks() to reflect the fact that it no
longer validates uniform blocks.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v3: Make validate_interstage_inout_blocks() skip uniform blocks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when attempting to link a vertex shader and a geometry
shader that use different GLSL versions, we would sometimes generate a
link error due to the implicit declaration of gl_PerVertex being
different between the two GLSL versions.
This patch fixes that problem by only requiring interface block
definitions to match when they are explicitly declared.
Fixes piglit test "shaders/version-mixing vs-gs".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: In the interface_block_definition constructor, move the assignment
to explicitly_declared after the existing if block.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 7.1 (Built-In Language Variables) of the GLSL 4.10
spec:
Also, if a built-in interface block is redeclared, no member of
the built-in declaration can be redeclared outside the block
redeclaration.
We have been regarding this text as a clarification to the behaviour
established for gl_PerVertex by GLSL 1.50, so we apply it regardless
of GLSL version.
This patch enforces the rule by adding an enum to ir_variable to track
how the variable was declared: implicitly, normally, or in an
interface block.
Fixes piglit tests:
- gs-redeclares-pervertex-out-after-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-global-redeclaration.vert
- gs-redeclares-pervertex-out-after-other-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-other-global-redeclaration.vert
- gs-redeclares-pervertex-out-before-global-redeclaration
- vs-redeclares-pervertex-out-before-global-redeclaration
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Don't set "how_declared" redundantly in builtin_variables.cpp.
Properly clone "how_declared".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unfortunately, our hardware only has one set of aggregating performance
counters shared between all 3D programs, and their values are not saved
or restored by hardware contexts. Also, at least on Sandybridge and
Ivybridge, the counters lose their values if the GPU goes to sleep.
To work around both of these problems, we have to snapshot the
performance counters at the beginning and end of each batch, similar to
how we handle query objects on platforms that don't support hardware
contexts. I call these "bookend" snapshots.
Since there can be multiple performance monitors active at a time, we
store the bookend snapshots in a global BO, shared by all monitors.
For monitors that span multiple batches, acquiring results involves
adding up three segments:
BeginPerfMonitor --> End of Batch 1 ("head")
Start of Batch 2 --> End of Batch 2
... ("middle")
Start of Batch N-1 --> End of Batch N-1
Start of Batch N --> EndPerfMonitor ("tail")
Monitors that refer to bookend BO snapshots are considered "unresolved".
We delay resolving them (and adding up deltas to obtain the results) as
long as possible to avoid blocking on mapping monitor->oa_bo.
We can also run out of space in the bookend BO, at which point we have
to resolve all unresolved monitors. Then we can throw away the
snapshots and begin writing at the beginning of the buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to use the Observability Architecture effectively, we'll need
to take snapshots of the OA counters via MI_REPORT_PERF_COUNT at the
start and end of each batch.
Experimentation reveals that we need to flush before and after each
MI_REPORT_PERF_COUNT to get working values. For simplicitly, I chose to
use intel_batchbuffer_emit_mi_flush(), which unfortunately expands to
triple pipe controls on Sandybridge.
We may want to start computing per-generation reserved batch space to
avoid the insanity of Sandybridge's PIPE_CONTROL cost. That said, much
of this cost existed before I rewrote the query object support to use
hardware contexts, so it's at least not entirely new.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, this only considers the monitor start and end snapshots.
This is woefully insufficient, but allows me to add a bunch of the
infrastructure now and flesh it out later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to start OA at the beginning of each batch where monitors are
active. OACONTROL isn't part of the hardware context, so to avoid
leaving counters enabled for other applications, we turn them off at the
end of the batch too.
We also need to start them at BeginPerfMonitor time (unless they've
already been started). We stop them when the monitor last ends as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll need to write this register to start/stop performance counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MI_REPORT_PERF_COUNT writes a snapshot of the Observability Architecture
counters to a buffer. Exactly how it works varies between generations:
Ironlake requires two packets, Sandybridge has to use GGTT, and Ivybridge
and later use PPGTT.
v2: Assert that we didn't use more space than we reserved (suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using the OA counters requires some per-batch work. When starting and
ending a batch, it's useful to know whether any monitors are actually
interested in OA data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In addition to listing the counter names, we include several "remap"
tables. Confusingly, counters are documented with names like "A23",
are written to some buffer offset other than 23, and exposed by core
Mesa under a counter ID that is different still.
The first is inevitable; MI_REPORT_PERF_COUNT writes certain counters to
fixed locations in the buffer. The latter could be avoided, but core
Mesa uses the "Counters" array index as the ID for a counter. We could
do remapping there, but it would just complicate the core Mesa code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is fairly simple:
- At BeginPerfMonitor time, take an opening snapshot.
- At EndPerfMonitor time, take a closing snapshot.
- The first time the application asks for results, subtract the two and
store that value. Then free the BO containing the snapshots.
- On subsequent requests for the results, just return the saved value.
- On reset, throw away the results.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, we only support these on Gen6+, since that's what currently
uses hardware contexts. When we add Ironlake hardware context support,
we can add pipeline statistics register support for that as well.
In theory, we could support pipeline statistics counters even without
hardware contexts, but it would be annoyingly painful.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we don't support any counters, there are zero groups.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Observability Architecture counters are 32-bit unsigned values, and
the Pipeline Statistics Register counters are 64-bit unsigned values.
These convenience macros make it easy to create those types of counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's useful to see the state of all outstanding monitors; the start
of a new batch seems like a reasonable time to print them out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These stub functions will be filled out in later patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will enable debugging printfs for the AMD_performance_monitor code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without hardware contexts, the pipeline statistics registers are
free-running and include data from every 3D application running.
In order to find out the contributions of one particular context, we
need to take a snapshot at the start and end of each batch.
Previously, we emitted the PIPE_CONTROL necessary to capture
PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it
happened only on the first draw of the batch, rather than on every draw.
Moving this to brw_new_batch increases symmetry, since the final
snapshot has always been in brw_finish_batch, which is just a few lines
below. It should be basically equivalent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new intel_batchbuffer_emit_render_ring_prelude() hook will be called
when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a
place to emit state that should go at the start of each render ring
batch, with minimal overhead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When we first create a batch buffer, it's empty. We don't actually
know what ring it will be targeted at until the first BEGIN_BATCH or
BEGIN_BATCH_BLT macro.
Previously, one could determine the state of the batch by checking
brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs.
unknown).
This should be functionally equivalent, but the tri-state enum is a bit
clearer.
v2: Catch three explicit require_space callers (thanks to Carl and Eric).
v3: Split the boolean -> enum change from the UNKNOWN_RING change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more
obvious than passing true or false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rounding errors could crop up when calculating a0. Use a more accurate
method (barycentric interpolation essentially) to fix this, though to fix
the REAL problem (which is that our interpolation will give very bad results
with small triangles far away from the origin when they have steep gradients)
this does absolutely nothing (actually makes it worse). (To fix the real
problem, either would need to use a vertex corner (or some other point inside
the tri) as starting point value instead of fb origin and pass that down to
interpolation, or mimic what hw does, use barycentric interpolation (using
the coordinates extracted from the rasterizer edge functions) - maybe another
time.)
Some (silly) tests though really want a high accuracy at fb origin and don't
care much about anything else (Just. Don't. Ask.).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
D3D9 Shader Model 2 restricted the fog register to one component,
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172945.aspx ,
but that restriction no longer exists in Shader Model 3, and several
WHCK tests enforce that.
So this change:
- lifts the single-component restriction TGSI_SEMANTIC_FOG
from Gallium interface
- updates the Mesa state tracker to enforce output fog has (f, 0, 0, 1)
- draw module was updated to leave TGSI_SEMANTIC_FOG output registers
alone
Several gallium drivers that are going out of their way to clear
TGSI_SEMANTIC_FOG components could be simplified in the future.
Thanks to Si Chen and Michal Krol for identifying the problem.
Testing done: piglit fogcoord-*.vpfp tests
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Same as Si Chen's commit e7a5905d8a for
tgsi_exec module.
Not actually tested, because softpipe is failing the test that caught
this bug due to unrelated issues.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Earlier comments suggest this was removed from GL core spec but it is
still there. Enabling makes 'texture_lod_bias_getter' Khronos
conformance tests pass, also removes some errors from Metro Last Light
game which is using this API.
v2: leave NOTE comment (Ian)
Cc: "9.0 9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The alignment of a virtual memory area must always be at least 4096 bytes.
It only worked because size was aligned to 4096 outside of the function.
Signed-off-by: Christian König <christian.koenig@amd.com>
If a user set MESA_INFO and the OpenGL application uses a
3.0 or later context then the MESA_INFO debug output will have
an error when it queries for extensions using the deprecated
enum GL_EXTENSIONS. Passing context argument allows code
to return extension list directly regardless of profile.
Commit title updated as recommended by Kenneth Graunke.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BLORP is essential. However, porting it to Gen8 is a huge amount of
work. Disabling it for now allows us to proceed with basic hardware
enablement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ is difficult to implement, and while it's essential for performance,
we don't need it right away for purposes of hardware enabling.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As always, the chipset limits here are placeholders, rather than the
actual values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 passed piglit, but swrast and gallium both segfaulted without this.
i965 happened to work because it never ran _mesa_load_state_parameters()
on the new program before the test called glProgramLocalParameter(), which
was allocating a LocalParams array for the fallback path.
v2: Since v1 threw away old localparams data, leaked old LocalParams
memory, only fixed fragment programs, and I was dubious of my previous
invariants already (nothing but program_parse.y will generate
LocalParams, and only that one path of program_parse.y will), just
late-allocate localparams at the other point of dereferencing them.
This adds overhead to _mesa_load_state_parameter, which is
uncomfortable, but I'm pretty sure that giant switch statement is
super slow already.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71734
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.
total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)
(no change in vertex shaders)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For commit 4df56177 Paul discovered that the hardware restriction that
Align16 instructions cannot be compressed was lifted on Haswell. This
has prevented us from emitting compressed three-source instructions.
For added confirmation, the bspec lists a work around called
WaBreakSimd16TernaryInstructionsIntoSimd8 that hasn't been applicable
since very early Haswell silicon.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, register_coalesce() would modify
mov vgrf1:f vgrf2:f
cmp null vgrf3:d vgrf1:d
to be
cmp null vgrf3:d vgrf2:f
and incorrectly use vgrf2's type in the instruction that the mov was
coalesced into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not necessary to scale down cubemap texture coords when generating
mipmaps: we are doing a 2x minification therefore it's guaranteed that
the texture coords will always be at least 1 texel away of the edges.
Scaling down can actually be harmful, as it may cause artefacts when
generating mipmaps with nearest filtering. Sample points will lie
exactly in the middle each 2x2 texels, so the scaling factor was causing
different texels to be take on each quadrant of the cube face. This is
apparent with a 1x1 checkerboard pattern in the base mipmap level:
instead of next mipmap level receiving a constant color throughout the
face, it will have different colors for each quadrant of the face.
The behaviour for blits is left untouched for now, but the cubemap
texture coord scaling hack should be reconsidered eventually.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to check the drawbuffer's orientation before inverting Y
coordinates. Fixes piglit feedback tests when running with the
-fbo option.
Cc: "9.2" "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The exec_mask must be taken in consideration, just like emit_kill above.
The tgsi_exec module has the same bug and should be fixed in a future
change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Gen7 does not allow render targets to have a vertical alignment of 2.
So, when creating a surface, if its format is renderable, and its
vertical alignment is 2, force it to use X tiling.
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen6+ allows for color buffers to use a vertical alignment of either 4
or 2. Previously we defaulted to 2. This may have caused problems on
Gen7 because Y-tiled render targets are not allowed to use a vertical
alignment of 2.
This patch changes the vertical alignment to 4 on Gen7, except for the
few formats where a vertical alignment of 2 is required.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 70953b5 (i965: Initialize all member variables of
vec4_instruction on construction) inadvertently added a line to the
vec4_instruction constructor setting this->ir to NULL, wiping out the
previously set value. As a result, ever since then, the output of
INTEL_DEBUG=vs and INTEL_DEBUG=gs has been missing IR annotations.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is basically a a respin of f1dfcf4bce35e6796f873d9a00103b280da81e4c
per Jose's suggestion.
Just set the SVGA3dSurfaceFormatCaps flags for 3D and cube textures
when checking the texture format capabilities. This will filter out
unsupported combinations like 3D+DXT.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Refine 8c533022. Provide a stub __glXGetCurrentContext() function when
$(DEFINES) are such that it is not a macro.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
We were always passing PIPE_TEXTURE_2D, but not all formats are
supported for all types of textures. In particular, the driver may
not supported texture compression for all types of textures.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add new OSMesaPostprocess() function to allow using the gallium
postprocessing filters. This only works for OSMesa with gallium
drivers, not the legacy swrast OSMesa.
Bump OSMESA_MAJOR/MINOR_VERSION numbers to 10.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move private data structures and function prototypes out of the
public postprocess.h header file.
Create a pp_private.h for the shared, private data structures, functions.
Remove pp_program.h header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
- Indent items under a GL version to allow context diffs to do their work.
- Move complete drivers into the GL version line - this should make the
stuff a little bit easier to read.
v2: keep the fd.o link (Emil Velikov)
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Joerg Mayer <jmayer@loplof.de>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above two variables are unused as of commit
commit 024fe6852a
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Apr 2 10:42:50 2013 -0700
radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA v2
which removed the only cpp file from drivers/radeon, but missed to
remove the CXXFLAGS. The sequential commit reintroduced and empty
LLVM_CPP_FILES.
Lets cleanup and remove both.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If a performance monitor has never ended, then no result can be
available. Core Mesa can easily handle this, saving drivers a tiny bit
of complexity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If a monitor has ended, it means a result should eventually become
available, pending some flushing.
This is distinct from !m->Active; if a monitor has not been started,
then m->Active == false and m->Ended == false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 implementation uses calloc, so I missed this. It's best to
simply initialize it to avoid requiring a zeroing allocator, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that branch 10.0 is created, bump the minor version in
master.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll need this for Broadwell code as well.
Normally, when we make things public, we add the "brw" prefix. I'm not
crazy about that in this case, since it deals with prog_instruction.h's
SWIZZLE_XYZW values, rather than the BRW_SWIZZLE_XYZW enums. However,
I can't think of a better name, and at least the comments and code make
it clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Broadwell code should not include brw_eu.h (since it is for Gen4-7
assembly encoding), but needs the URB write flags enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Only Gen4 color write setup uses the force_sechalf flag, and it only
sets it on a single instruction. It also already has to get a pointer
to the instruction and manually set the saturate flag, so we may as well
just set force_sechalf the same way and avoid the complexity of a stack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Previous assumption was that the same set of flags can be reused
for both classic and gallium drivers. With megadriver work done
the classic drivers ended up using their own (single) instance of
the flags.
Move these into Automake.inc and rename to indicate that those
are gallium specific. Additionally silence an automake/autoconf
warning "XXX is not a standard libtool library name", due to
the parsing issues of the module tag.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Greatly reduce duplication and provide a sane minimum of
CFLAGS for all DRI targets.
Note: This commit adds VISIBILITY_CFLAGS to the following:
* freedreno
* i915
* ilo
* nouveau
* vmwgfx
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order to use the trace driver, one needs to define
GALLIUM_TRACE. Neither one of the two targets was
defining it, thus we're safe to remove libtrace.la.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Automake.inc already has GALLIUM_VIDEO_CFLAGS, which
provide the essential compiler flags needed.
Note: this commit adds VISIBILITY_CFLAGS to nouveau.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Cleanup the duplicating flags and consolidate into a sigle variable.
Note: this patch adds VISIBILITY_CFLAGS to the following targets
* freedreno/drm
* i915/{drm,sw}
* nouveau/drm
* sw/fbdev
* sw/null
* sw/wayland
* sw/wrapper
* sw/xlib
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order for one to use trace, noop, rbug and/or galahad, they must
set the corresponding GALLIUM_* CFLAG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Store the compiler flags into a variable, in order to minimise
flags duplication (amongst vdpau and xvmc).
Note: this commit add VISIBILITY_CFLAGS to the nouveau target
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* minimise flags duplication
* distingush between VISIBILITY C and CXX flags
* set only required flags - C and/or CXX
v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* Allow the lists to be shared among build systems.
* Update automake and Android build systems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly everything within the three Makefile.am's is identical.
Let's simplify things a little.
v2: Rebase and rewrite the commit message (Emil Velikov)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The array was 64kb per struct gl_program, plus we statically stored a copy
of one on disk for _mesa_DummyProgram. Given that most struct gl_programs
we generate are for GLSL shaders that don't have local parameters, this
was a waste.
Since you can store and fetch parameters beyond what the program actually
uses, we do have to do a late allocation if necessary at
GetProgramLocalParameter time.
Reduces peak memory usage in the dota2 trace I made by 76MB (4.5%)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to env parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to local parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Notably, ENV and LOCAL aren't used any more (replaced by STATE_VAR), but
apparently CONSTANT is.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment was stale, because the lowering in question wasn't happening
in lower_instructions.cpp. Presumably if the lowering ever moves there,
we can plumb the lowering mask through to opt_algebraic.
total instructions in shared programs: 1618696 -> 1616810 (-0.12%)
instructions in affected programs: 243018 -> 241132 (-0.78%)
GAINED: 0
LOST: 0
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, brw_new_batch was called just after execbuf, but before
intel_batchbuffer_reset. Essentially, it prepared for the creation of a
new batch, that wasn't yet available, and which it didn't create. This
was a bit awkward.
This patch makes brw_new_batch call intel_batchbuffer_reset as the very
first operation. This means that brw_new_batch actually creates a new
batchbuffer, and thus has it available. It brings the creation of the
new batchbuffer and BRW_NEW_BATCH flagging together into one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
More rebase fail. This code was written long before i915 and i965 were
split, so most of the code in i9[16]5/intel_screen.c only needed to
exist in one place. It looks like I fixed n-1 of those places after
rebasing on the split.
I only found this from the defined-but-not-used warning for
intelRendererQueryExtension. I noticed this while fixing the other,
related warnings.
(Note: During review, we decided to *not* pick this back to 10.0.)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
radeon_llvm_compile allocates memory for binary.code, binary.config,
or neither depending on what's being done.
We need to make sure to free that memory after it's no longer needed.
v2: Don't bother checking for null before FREE()
CC: "10.0" <mesa-stable@lists.freedesktop.org>
use memset to initialize to 0's... otherwise code_size and config_size
could be uninitialized when read later in this method.
It's also hard to do NULL checks on uninitialized pointers.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
v2: Fix indentation
CC: "10.0" <mesa-stable@lists.freedesktop.org>
For DX9-level shaders, there's only limited support for indirect
indexing of registers (with the loop counter register, not the
general address register.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
After we blit/copy to a dest texture image we need to mark it as
being defined. This fixes broken mipmap generation for quite a
few texture formats. Mipgen involves making texture views and
svga_texture_view_surface() skips texture images that are undefined.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The index translation code expects the number of indexes to be
consistent with the primitive type (ex: a multiple of 3 for
PIPE_PRIM_TRIANGLES). If it's not, we can write out of bounds
in the destination buffer.
Fixes failed assertions in the pipebuffer debug code found with
Piglit primitive-restart-draw-mode test.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, when doing intrastage and interstage interface block
linking, we only checked the interface type; this prevented us from
catching some link errors.
We now check the following additional constraints:
- For intrastage linking, the presence/absence of interface names must
match.
- For shader ins/outs, the interface names themselves must match when
doing intrastage linking (note: it's not clear from the spec whether
this is necessary, but Mesa's implementation currently relies on
it).
- Array vs. nonarray must be consistent, taking into account the
special rules for vertex-geometry linkage.
- Array sizes must be consistent (exception: during intrastage
linking, an unsized array matches a sized array).
Note: validate_interstage_interface_blocks currently handles both
uniforms and in/out variables. As a result, if all three shader types
are present (VS, GS, and FS), and a uniform interface block is
mentioned in the VS and FS but not the GS, it won't be validated. I
plan to address this in later patches.
Fixes the following piglit tests in spec/glsl-1.50/linker:
- interface-blocks-vs-fs-array-size-mismatch
- interface-vs-array-to-fs-unnamed
- interface-vs-unnamed-to-fs-array
- intrastage-interface-unnamed-array
v2: Simplify logic in intrastage_match() for handling array sizes.
Make extra_array_level const. Use an unnamed temporary
interface_block_definition in validate_interstage_interface_blocks()'s
first call to definitions->store().
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From the Sandy Bridge PRM, Vol 1 Part 1 7.18.3.4 (Alignment Unit
Size):
j [vertical alignment] = 4 for any render target surface is
multisampled (4x)
From the Ivy Bridge PRM, Vol 4 Part 1 2.12.2.1 (SURFACE_STATE for most
messages), under the "Surface Vertical Alignment" heading:
This field is intended to be set to VALIGN_4 if the surface was
rendered as a depth buffer, for a multisampled (4x) render target,
or for a multisampled (8x) render target, since these surfaces
support only alignment of 4.
Back in 2012 when we added multisampling support to the i965 driver,
we forgot to update the logic for computing the vertical alignment, so
we were often using a vertical alignment of 2 for multisampled
buffers, leading to subtle rendering errors.
Note that the specs also require a vertical alignment of 4 for all
Y-tiled render target surfaces; I plan to address that in a separate
patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53077
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For both vertex and fragment shaders we default MaxUniformComponents
to 4 * MAX_UNIFORMS. It makes sense to do this for geometry shaders
too; if back-ends have different limits they can override them as
necessary.
Fixes piglit test:
spec/glsl-1.50/built-in constants/gl_MaxGeometryUniformComponents
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* Inherit gl_context so we always have access to it
* Thanks curro for the idea.
* Last Haiku cannidate for 10.0.0
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The debug printfs wouldn't actually compile when enabled, so kill them off
and insert some new one in another place, and make sure it keeps compiling
by enclosing it in a if-0 clause.
In particular get rid of home-grown vector helpers which didn't add much.
And while here fix formatting a bit. No functional change.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Systems with little physical memory installed will report less than
2GiB, and some systems may (hypothetically?) have a larger address space
for the GPU. My IVB still reports 1534.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch implements accelerated path for glDrawPixels from a PBO in
i965. The code follows what intel_pixel_read, intel_pixel_copy,
intel_pixel_bitmap and intel_tex_image are doing. Piglit quick.tests
show no regressions. In my testing on IVB, performance improvement is
huge (about 30x, didn't measure exactly) since generic path goes via
_mesa_unpack_color_span_float, memcpy, extract_float_rgba.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
createContextAttribs is a superset of what createNewContext provides.
Also remove the function typedef, since createNewContext is deprecated
and no longer used in multiple interfaces.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This lets us allocate color buffers as __DRIimages and pass them into
the driver instead of having to create a __DRIbuffer with the flink
that requires.
With this patch, we can now run gbm on render-nodes. A render-node is a
drm device that doesn't support modesetting and all the legacy DRI ioctls.
flink is also not supported, but now that gbm doesn't need flink, we can
run piglit on head-less gbm or head-less GPGPU.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since LIFO fails on some shaders in one particular way, and non-LIFO
systematically fails in another way on different kinds of shaders, try
them both, and pick whichever one successfully register allocates first.
Slightly prefer non-LIFO in case we produce extra dependencies in register
allocation, since it should start out with fewer stalls than LIFO.
This is madness, but I haven't come up with another way to get unigine
tropics to not spill while keeping other programs from not spilling and
retaining the non-unigine performance wins from texture-grf.
total instructions in shared programs: 1626728 -> 1626288 (-0.03%)
instructions in affected programs: 1015 -> 575 (-43.35%)
GAINED: 50
LOST: 0
Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling
barriers, so we had to run scheduler before them in order for it to be
able to do basically anything. Now that that's fixed, we can delay the
scheduling until we go to allocate (which will make the next change less
scary).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We care about depth-until-program-end, as a proxy for "make sure I
schedule those early instructions that open up the other things that can
make progress while keeping register pressure low", not actual latency
(since we're relying on the post-register-alloc scheduling to actually
schedule for the hardware).
total instructions in shared programs: 1609931 -> 1609931 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 55
LOST: 43
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the SIMD16 spilling changes, I replaced a "1" in the spill path with
"mlen", but obviously it wasn't mlen before because spills have the g0
header along with the payload. The interface I was trying to use was
asking for how many physical regs we're writing, so we're looking for "1"
or "2".
I'm guessing this actually passed piglit because the high 8 bits of the
execution mask in SIMD8 mode are all 0s.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, the best thing we had was to schedule the things unblocked by
the last chosen instruction, on the hope that it would be consuming two
values at the end of their live intervals while only producing one new
value. But that's just a guess, and we can do counting of usage of
registers to know when an instruction would (almost surely) reduce
register pressure.
The only failure mode I know of in this new dominant heuristic is that
inside of a loop when scheduling the iterator (for example), choosing the
last use of the iterator doesn't actually reduce the live interval of the
iterator. But it doesn't seem to matter in shader-db:
total instructions in shared programs: 1618700 -> 1618700 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 13
LOST: 0
Note: The new functions are made virtual because I expect we'll soon lift
the pre-regalloc scheduling heuristic over to the vec4 backend.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes infinite loop in find_grid_optimal_factor() in cases where the
user specifies a grid size with less dimensions than the device
supports.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since we explicitly require a integer input we should avoid using exp2 math
(even if we were using optimized versions), which turns the exp2 into a int
sub (plus some casts).
v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise, the function would enable generic vertex attributes 0
and 1 of the array object it does not own. This was causing crashes
in Euro Truck Simulator 2, since the incorrectly enabled generic
attribute 0 in the foreign context got precedence before vertex
position attribute at later time, leading to NULL pointer dereference.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Petr Sebor <petr@scssoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adding a vl_mpeg-based helper didn't seem to work, as it produced data
that the card couldn't handle. (And I didn't investigate further.) This
makes the decoding functionality only accessible via XvMC and avoids
crashes when attempting to use VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash in glamor when mesa links against static LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
This makes it possible to use clover with statically linked LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
X_f, Y_f, Xp_f, Yp_f variables are used just inside
translate_dst_to_src().So, they can be defined just
as local variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When this function was added, the returned value was signed in some
places, unsigned in others.
v2: also add unsigned in the unit test, per Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we would bogusly replace the entire statement containing the
ir_texture node with an ir_dereference_variable.
Correct this to just replace the ir_texture node itself as intended.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b16b3c87 began performing CSE on CMP instructions with null
destinations. I relaxed the restrictions a bit too much, thereby
allowing CSE to be performed on instructions with, for instance, an
explicit accumulator destination.
This broke the arb_gpu_shader5/fs-imulExtended shader tests because
they emit MUL instructions with the accumulator as the destination. CSE
would instead cause the MUL to write to a GRF, which is lower precision
than the accumulator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
<li><ahref="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<ahref="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><ahref="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64323">Bug 64323</a> - Severe misrendering in Left 4 Dead 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68838">Bug 68838</a> - GLSL: struct declarations produce a "empty declaration warning" in 9.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=58660">Bug 58660</a> - CAYMAN broken with HyperZ on</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64471">Bug 64471</a> - Radeon HD6570 lockup in Brütal Legend with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66352">Bug 66352</a> - GPU lockup in L4D2 on TURKS with HyperZ</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68799">Bug 68799</a> - [APITRACE] Hyper-Z lockup with Falcon BMS 4.32u6 on CAYMAN</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71547">Bug 71547</a> - compilation failure :#error "SSE4.1 instruction set not enabled"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72685">Bug 72685</a> - [radeonsi hyperz] Artifacts in Unigine Sanctuary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73088">Bug 73088</a> - [HyperZ] Juniper (6770): Gone Home / Unigine Heaven 4.0 lock up system after several minutes of use</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74428">Bug 74428</a> - hyperz causes gpu hang in Counter-strike: Source</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74803">Bug 74803</a> - [r600g] HyperZ broken on RV630 (Cogs shadows are broken)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74892">Bug 74892</a> - HyperZ GPU lockup with radeonsi 7970M PITCAIRN and Distance Alpha game</li>
@@ -55,16 +57,89 @@ Note: some of the new features are only available with certain drivers.
<li>GL_ARB_vertex_attrib_binding</li>
<li>GL_ARB_vertex_type_10f_11f_11f_rev on i965 and r600g</li>
<li>GL_KHR_debug</li>
<li>GLX_MESA_query_renderer</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<p>Attempts have been made to <b>not</b> include bugs fixed in previous 9.2
releases or bugs that were regressions during 10.0 development. This list is
likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47755">Bug 47755</a> - [glsl-compiler] no error checking when Interpolation qualifier for built-in variable is different in vertex and fragment shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=52171">Bug 52171</a> - [gallium/r600/clover] Simple benchmarks failed to run</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=53077">Bug 53077</a> - [IVB] Output error with msaa when both of framebuffer and source color's alpha are not 1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=54867">Bug 54867</a> - bug in r300 compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60929">Bug 60929</a> - [r600-llvm] mono games with opengl are blocking on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62142">Bug 62142</a> - Mesa/demo mipmap_limits upside down with running by SOFTWARE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66213">Bug 66213</a> - Certain Mesa Demos Rendering Inverted (vertically)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66806">Bug 66806</a> - [softpipe] glxgears floating point exception</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67921">Bug 67921</a> - [bisected commit 883987] crosscompiling fails with util/u_cpu_detect.c:247:4: error: 'asm' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68162">Bug 68162</a> - [radeonsi] texture rendering is broken in Source-Engine games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68451">Bug 68451</a> - Texture flicker in native Dota2 in mesa 9.2.0rc1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68503">Bug 68503</a> - Graphical glitches in Serious Sam 3 when SB is enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68792">Bug 68792</a> - Problems during playback of h264 files using UVD and VLC on AMD E-350 CPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70042">Bug 70042</a> - Major texture flickering in Dota 2 (r600g on HD 6950)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70088">Bug 70088</a> - Glamor on r600g crashes Xserver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70123">Bug 70123</a> - Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70327">Bug 70327</a> - Casting floating point variable to integer not working properly while constant gets converted properly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70891">Bug 70891</a> - CL_INVALID_BUILD_OPTIONS results in CL_INVALID_DEVICE when asking for build log</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71022">Bug 71022</a> - configure: error: Expat required for DRI.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71110">Bug 71110</a> - xorg_driver.c:1030:2: error: too many arguments to function ‘DamageUnregister’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78225">Bug 78225</a> - Compile error due to undefined reference to `gbm_dri_backend', fix attached</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78537">Bug 78537</a> - no anisotropic filtering in a native Half-Life 2</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix double-freeing of dispatch tables inside glBegin/End.</li>
</ul>
<p>Carl Worth (3):</p>
<ul>
<li>docs: Add MD5 sums for 10.1.3</li>
<li>cherry-ignore: Roland and Michel agreed to drop these patches.</li>
<li>VERSION: Update to 10.1.4</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>configure: error out if building GBM without dri</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>i965/vs: Use samplers for UBOs in the VS like we do for non-UBO pulls.</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nv50/ir: make sure to reverse cond codes on all the OP_SET variants</li>
<li>nv50: fix setting of texture ms info to be per-stage</li>
<li>nv50/ir: fix integer mul lowering for u32 x u32 -> high u32</li>
</ul>
<p>Michel Dänzer (1):</p>
<ul>
<li>radeonsi: Fix anisotropic filtering state setup</li>
</ul>
<p>Tom Stellard (2):</p>
<ul>
<li>configure.ac: Add LLVM_VERSION_PATCH to DEFINES</li>
<li>radeonsi: Enable geometry shaders with LLVM 3.4.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77865">Bug 77865</a> - [BDW] Many Ogles3conform framebuffer_blit cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78581">Bug 78581</a> - OpenCL: clBuildProgram prints error messages directly rather than storing them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79029">Bug 79029</a> - INTEL_DEBUG=shader_time is full of lies</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79729">Bug 79729</a> - [i965] glClear on a multisample texture doesn't work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82709">Bug 82709</a> - OpenCL not working on radeon hainan</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82814">Bug 82814</a> - glDrawBuffers(0, NULL) segfaults in _mesa_drawbuffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83079">Bug 83079</a> - [NVC0] Dota 2 (Linux native and Wine) crash with Nouveau Drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
</ul>
<h2>Changes</h2>
<p>Adam Jackson (1):</p>
<ul>
<li>radeonsi: Don't use anonymous struct trick in atom tracking</li>
</ul>
<p>Alex Deucher (2):</p>
<ul>
<li>radeonsi: add new CIK pci ids</li>
<li>radeonsi: add new SI pci ids</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>winsys/radeon: fix nop packet padding for hawaii</li>
</ul>
<p>Anuj Phogat (1):</p>
<ul>
<li>i965: Bail on vec4 copy propagation for scratch writes with source modifiers</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix NULL pointer deref bug in _mesa_drawbuffers()</li>
</ul>
<p>Carl Worth (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.2.6 release</li>
<li>Makefile: Switch from md5sums to sha256sums</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>i965: add missing parens in vec4 visitor</li>
</ul>
<p>Emil Velikov (17):</p>
<ul>
<li>configure.ac: bail out if building gallium_gbm without gallium_egl</li>
<li>android: gallium/nouveau: fix include folders, link against libstlport</li>
<li>android: egl/main: fixup the nouveau build</li>
<li>automake: gallium/freedreno: drop spurious include dirs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77493">Bug 77493</a> - lp_test_arit fails with llvm >= llvm-3.5svn r206094</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84140">Bug 84140</a> - mplayer crashes playing some files using vdpau output</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84662">Bug 84662</a> - Long pauses with Unreal demo Elemental on R9270X since : Always flush the HDP cache before submitting a CS to the GPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85267">Bug 85267</a> - vlc crashes with vdpau (Radeon 3850HD) [r600]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
<li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.3.6 release</li>
<li>Update version to 10.3.7</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50,nvc0: set vertex id base to index_bias</li>
<li>nv50/ir: fix texture offsets in release builds</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
<li>i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>glsl_to_tgsi: fix a bug in copy propagation</li>
<li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77865">Bug 77865</a> - [BDW] Many Ogles3conform framebuffer_blit cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78225">Bug 78225</a> - Compile error due to undefined reference to `gbm_dri_backend', fix attached</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78258">Bug 78258</a> - make check link_varyings.gl_ClipDistance failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78403">Bug 78403</a> - query_renderer_implementation_unittest.cpp:144:4: error: expected primary-expression before ‘.’ token</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78468">Bug 78468</a> - Compiling of shader gets stuck in infinite loop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78537">Bug 78537</a> - no anisotropic filtering in a native Half-Life 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78581">Bug 78581</a> - OpenCL: clBuildProgram prints error messages directly rather than storing them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78648">Bug 78648</a> - Texture artifacts in Kerbal Space Program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78665">Bug 78665</a> - macros in builtin_functions.cpp make invalid assumptions about M_PI definitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78692">Bug 78692</a> - Football Manager 2014, gameplay rendered black & white</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78716">Bug 78716</a> - Fix Mesa bugs for running Unreal Engine 4.1 Cave effects demo compiled for Linux</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78803">Bug 78803</a> - gallivm/lp_bld_debug.cpp:42:28: fatal error: llvm/IR/Module.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78888">Bug 78888</a> - test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79029">Bug 79029</a> - INTEL_DEBUG=shader_time is full of lies</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79095">Bug 79095</a> - x86/common_x86.c:348:14: error: use of undeclared identifier 'bit_SSE4_1'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79263">Bug 79263</a> - Linking error in egl_gallium.la when compiling 32 bit on multiarch</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79294">Bug 79294</a> - Xlib-based build broken on non x86/x86-64 architectures</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79373">Bug 79373</a> - Non-const initializers for matrix and vector constructors</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79382">Bug 79382</a> - build error: multiple definition of `loader_get_pci_id_for_fd'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79616">Bug 79616</a> - L4D2 crash on startup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79724">Bug 79724</a> - switch statement type check</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79729">Bug 79729</a> - [i965] glClear on a multisample texture doesn't work</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79809">Bug 79809</a> - radeonsi: mouse cursor corruption using weston on AMD Kaveri</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79823">Bug 79823</a> - [NV30/gallium] Mozilla apps freeze on startup with nouveau-dri-10.2.1 libs on dual-screen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79885">Bug 79885</a> - commit b52a530 (gallium/egl: st_profiles are build time decision, treat them as such) broke egl</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79903">Bug 79903</a> - [HSW Bisected]Some Piglit and Ogles2conform cases fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=81020">Bug 81020</a> - [radeonsi][regresssion] Wireframe of background rendered through objects in Half-Life 2: Episode 2 with MSAA enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82159">Bug 82159</a> - No rule to make target `../../../../src/mesa/libmesa.la', needed by `collision'.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82255">Bug 82255</a> - [VP2] Chroma planes are vertically stretched during VDPAU playback</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82268">Bug 82268</a> - Add support for the OpenRISC architecture (or1k)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82428">Bug 82428</a> - [radeonsi,R9 270X] System lockup when using mplayer/mpv with VDPAU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82483">Bug 82483</a> - format_srgb.h:145: undefined reference to `util_format_srgb_to_linear_8unorm_table'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82517">Bug 82517</a> - [RADEONSI,VDPAU] SIGSEGV in map_msg_fb_buf called from ruvd_destroy, when closing a Tab with accelerated video player</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82534">Bug 82534</a> - src\egl\main\eglapi.h : fatal error LNK1107: invalid or corrupt file: cannot read at 0x2E02</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82536">Bug 82536</a> - u_current.h:72: undefined reference to `__imp__glapi_Dispatch'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82538">Bug 82538</a> - Super Maryo Chronicles fails with st/mesa assertion failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87658">Bug 87658</a> - [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87913">Bug 87913</a> - CPU cacheline size of 0 can be returned by CPUID leaf 0x80000006 in some virtual machines</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
<li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>Revert "r600g/sb: fix issues cause by GLSL switching to loops for switch"</li>
<li>r600g: fix regression since UCMP change</li>
<li>r600g/sb: implement r600 gpr index workaround. (v3.1)</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.1 release</li>
<li>Update version to 10.4.2</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50,nvc0: set vertex id base to index_bias</li>
<li>nv50/ir: fix texture offsets in release builds</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
<li>i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.</li>
</ul>
<p>Leonid Shatz (1):</p>
<ul>
<li>gallium/util: make sure cache line size is not zero</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>glsl_to_tgsi: fix a bug in copy propagation</li>
<li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88662">Bug 88662</a> - unaligned access to gl_dlist_node</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88930">Bug 88930</a> - [osmesa] osbuffer->textures should be indexed by attachment type</li>
</ul>
<h2>Changes</h2>
<p>Brian Paul (1):</p>
<ul>
<li>mesa: fix display list 8-byte alignment issue</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.3 release</li>
<li>Update version to 10.4.4</li>
</ul>
<p>José Fonseca (1):</p>
<ul>
<li>egl: Pass the correct X visual depth to xcb_put_image().</li>
</ul>
<p>Mario Kleiner (1):</p>
<ul>
<li>glx/dri3: Request non-vsynced Present for swapinterval zero. (v3)</li>
</ul>
<p>Matt Turner (1):</p>
<ul>
<li>gallium/util: Don't use __builtin_clrsb in util_last_bit().</li>
</ul>
<p>Niels Ole Salscheider (1):</p>
<ul>
<li>configure: Link against all LLVM targets when building clover</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88658">Bug 88658</a> - (bisected) Slow video playback on Kabini</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
</ul>
<h2>Changes</h2>
<p>Carl Worth (1):</p>
<ul>
<li>Revert use of Mesa IR optimizer for ARB_fragment_programs</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.4 release</li>
<li>get-pick-list.sh: Require explicit "10.4" for nominating stable patches</li>
<li>Update version to 10.4.5</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: bail out of 2d blits with non-A8_UNORM alpha formats</li>
<li>st/mesa: treat resource-less xfb buffers as if they weren't there</li>
<li>nvc0: allow holes in xfb target lists</li>
</ul>
<p>Jeremy Huddleston Sequoia (2):</p>
<ul>
<li>darwin: build fix</li>
<li>darwin: build fix</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>i965: Override swizzles for integer luminance formats.</li>
<li>i965: Use a gl_color_union for sampler border color.</li>
<li>i965: Fix integer border color on Haswell.</li>
<li>glsl: Reduce memory consumption of copy propagation passes.</li>
</ul>
<p>Laura Ekstrand (1):</p>
<ul>
<li>main: Fixed _mesa_GetCompressedTexImage_sw to copy slices correctly.</li>
</ul>
<p>Marek Olšák (5):</p>
<ul>
<li>r600g,radeonsi: don't append to streamout buffers that haven't been used yet</li>
<li>radeonsi: fix instanced arrays with non-zero start instance</li>
<li>radeonsi: small fix in SPI state</li>
<li>mesa: fix AtomicBuffer typo in _mesa_DeleteBuffers</li>
<li>radeonsi: fix a crash if a stencil ref state is set before a DSA state</li>
</ul>
<p>Michel Dänzer (2):</p>
<ul>
<li>st/mesa: Don't use PIPE_USAGE_STREAM for GL_PIXEL_UNPACK_BUFFER_ARB</li>
<li>Revert "radeon/llvm: enable unsafe math for graphics shaders"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
</ul>
<h2>Changes</h2>
<p>Abdiel Janulgue (2):</p>
<ul>
<li>glsl: Don't optimize min/max into saturate when EmitNoSat is set</li>
<li>st/mesa: For vertex shaders, don't emit saturate when SM 3.0 is unsupported</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>glx: Fix returned values of GLX_RENDERER_PREFERRED_PROFILE_MESA</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>swrast: fix multiple color buffer writing</li>
<li>st/mesa: fix sampler view reference counting bug in glDraw/CopyPixels</li>
</ul>
<p>Chris Forbes (1):</p>
<ul>
<li>i965/gs: Check newly-generated GS-out VUE map against correct stage</li>
</ul>
<p>Eduardo Lima Mitev (1):</p>
<ul>
<li>mesa: Fix error validating args for TexSubImage3D</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.5 release</li>
<li>install-lib-links: remove the .install-lib-links file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89530">Bug 89530</a> - FTBFS in loader: missing fstat</li>
</ul>
<h2>Changes</h2>
<p>Andrey Sudnik (1):</p>
<ul>
<li>i965/vec4: Don't lose the saturate modifier in copy propagation.</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>egl: Take alpha bits into account when selecting GBM formats</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.6 release</li>
<li>cherry-ignore: add not applicable/rejected commits</li>
<li>mesa: rename format_info.c to format_info.h</li>
<li>loader: include <sys/stat.h> for non-sysfs builds</li>
<li>auxiliary/os: fix the android build - s/drm_munmap/os_munmap/</li>
<li>Update version to 10.4.7</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>i965: Fix out-of-bounds accesses into pull_constant_loc array</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>freedreno: move fb state copy after checking for size change</li>
<li>freedreno/ir3: fix array count returned by TXQ</li>
<li>freedreno/ir3: get the # of miplevels from getinfo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70410">Bug 70410</a> - egl-static/Makefile: linking fails with llvm >= 3.4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72685">Bug 72685</a> - [radeonsi hyperz] Artifacts in Unigine Sanctuary</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=72819">Bug 72819</a> - [855GM] Incorrect drop shadow color on windows and strange white rectangle when showing/hiding GLX-dock...</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74563">Bug 74563</a> - Surfaceless contexts are not properly released by DRI drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74863">Bug 74863</a> - [r600g] HyperZ broken on RV770 and CYPRESS (Left 4 Dead 2 trees corruption) bisected!</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=75011">Bug 75011</a> - [hyperz] Performance drop since git-01e6371 (disable hyperz by default) with radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=75112">Bug 75112</a> - Meta Bug for HyperZ issues on r600g and radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=76252">Bug 76252</a> - Dynamic loading/unloading of opengl32.dll results in a deadlock</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=76861">Bug 76861</a> - mid3 generates slow code for constant arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77957">Bug 77957</a> - Variably-indexed constant arrays result in terrible shader code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=78468">Bug 78468</a> - Compiling of shader gets stuck in infinite loop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79155">Bug 79155</a> - [Tesseract Game] Global Illumination: Medium Causes Color Distortion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79462">Bug 79462</a> - [NVC0/Codegen] Shader compilation falis in spill logic</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80050">Bug 80050</a> - [855GM] Incorrect drop shadow color under windows in Cinnamon persists with MESA 10.1.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80247">Bug 80247</a> - Khronos conformance test ES3-CTS.gtf.GL3Tests.transform_feedback.transform_feedback_vertex_id fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80561">Bug 80561</a> - Incorrect implementation of some VDPAU APIs.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82538">Bug 82538</a> - Super Maryo Chronicles fails with st/mesa assertion failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82539">Bug 82539</a> - vmw_screen_dri.lo In file included from vmw_screen_dri.c:41: vmwgfx_drm.h:32:17: error: drm.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82796">Bug 82796</a> - [IVB/BYT-M/HSW/BDW Bisected]Synmark2_v6.0_OglTerrainFlyInst/OglTerrainPanInst cannot run as image validation failed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83148">Bug 83148</a> - Unity invisible under Ubuntu 14.04 and 14.10</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83355">Bug 83355</a> - FTBFS: src/mesa/program/program_lexer.l:122:64: error: unknown type name 'YYSTYPE'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83380">Bug 83380</a> - Linking fails when not writing gl_Position.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83418">Bug 83418</a> - EU IV is incorrectly rendered after git1409011930.d571f2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84178">Bug 84178</a> - Big glamor regression in Xorg server 1.6.99.1 GIT: x11perf 1.5 Test: PutImage XY 500x500 Square</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84355">Bug 84355</a> - texture2DProjLod and textureCubeLod are not supported when using GLES.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84529">Bug 84529</a> - [IVB bisected] glean fragProg1 CMP test failed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84538">Bug 84538</a> - lp_test_format.c:226:4: error: too few arguments to function ‘gallivm_create’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84557">Bug 84557</a> - [HSW] "Emit ELSE/ENDIF JIP with type D on Gen 7" causes Atomic Afterlife and GPU hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84651">Bug 84651</a> - Distorted graphics or black window when running Battle.net app on Intel hardware via wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84662">Bug 84662</a> - Long pauses with Unreal demo Elemental on R9270X since : Always flush the HDP cache before submitting a CS to the GPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84807">Bug 84807</a> - Build issue starting between bf4aecfb2acc8d0dc815105d2f36eccbc97c284b and a3e9582f09249ad27716ba82c7dfcee685b65d51</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85189">Bug 85189</a> - llvm/invocation.cpp: In function 'void {anonymous}::optimize(llvm::Module*, unsigned int, const std::vector<llvm::Function*>&)': llvm/invocation.cpp:324:18: error: expected type-specifier</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85267">Bug 85267</a> - vlc crashes with vdpau (Radeon 3850HD) [r600]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85377">Bug 85377</a> - lp_test_format failure with llvm-3.6</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85425">Bug 85425</a> - [bisected] Compiler error in clip control operations in meta</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85429">Bug 85429</a> - indirect.c:296: multiple definition of `__indirect_glNewList'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85454">Bug 85454</a> - Unigine Sanctuary with Wine crashes on Mesa Git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85647">Bug 85647</a> - Random radeonsi crashes with mesa 10.3.x</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85691">Bug 85691</a> - 'glsl: Drop constant 0.0 components from dot products.' broke piglit shaders/glsl-gnome-shell-dim-window and a few others with Gallium</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>GL_ARB_framebuffer_sRGB on freedreno</li>
<li>GL_ARB_texture_rg on freedreno</li>
<li>GL_EXT_packed_float on freedreno</li>
<li>GL_EXT_polygon_offset_clamp on i965, nv50, nvc0, r600, radeonsi, llvmpipe</li>
<li>GL_EXT_texture_shared_exponent on freedreno</li>
<li>GL_EXT_texture_snorm on freedreno</li>
</ul>
<h2>Bug fixes</h2>
<p>This list is likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=10370">Bug 10370</a> - Incorrect pixels read back if draw bitmap texture through Display list</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60879">Bug 60879</a> - [radeonsi] X11 can't start with acceleration enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67672">Bug 67672</a> - [llvmpipe] lp_test_arit fails on old CPUs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=77544">Bug 77544</a> - i965: Try to use LINE instructions to perform MAD with immediate arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83510">Bug 83510</a> - Graphical glitches in Unreal Engine 4</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=83908">Bug 83908</a> - [i965] Incorrect icon colors in Steam Big Picture</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84212">Bug 84212</a> - [BSW]ES3-CTS.shaders.loops.do_while_dynamic_iterations.vector_counter_vertex fails and causes GPU hang</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84651">Bug 84651</a> - Distorted graphics or black window when running Battle.net app on Intel hardware via wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86837">Bug 86837</a> - kodi segfault since auxiliary/vl: rework the build of the VL code</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86939">Bug 86939</a> - test_vf_float_conversions.cpp:63:12: error: expected primary-expression before ‘union’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86944">Bug 86944</a> - glsl_parser_extras.cpp", line 1455: Error: Badly formed expression. (Oracle Studio)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86958">Bug 86958</a> - lp_bld_misc.cpp:503:40: error: no matching function for call to ‘llvm::EngineBuilder::setMCJITMemoryManager(ShaderMemoryManager*&)’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86969">Bug 86969</a> - _drm_intel_gem_bo_references() function takes half the CPU with Witcher2 game</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87076">Bug 87076</a> - Dead Island needs allow_glsl_extension_directive_midshader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87658">Bug 87658</a> - [llvmpipe] SEGV in sse2_has_daz on ancient Pentium4-M</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87694">Bug 87694</a> - [SNB] Crash in brw_begin_transform_feedback</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87886">Bug 87886</a> - constant fps drops with Intel and Radeon</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87913">Bug 87913</a> - CPU cacheline size of 0 can be returned by CPUID leaf 0x80000006 in some virtual machines</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88079">Bug 88079</a> - dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0 tests fail due to enabling of GL_RGB and GL_RGBA</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88219">Bug 88219</a> - include/c11/threads_posix.h:197: undefined reference to `pthread_mutex_lock'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88227">Bug 88227</a> - Radeonsi: High GTT usage in Prison Architect large map</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88248">Bug 88248</a> - Calling glClear while there is an occlusion query in progress messes up the results</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88467">Bug 88467</a> - nir.c:140: error: ‘nir_src’ has no member named ‘ssa’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88478">Bug 88478</a> - #error "<malloc.h> has been replaced by <stdlib.h>"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88519">Bug 88519</a> - sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88523">Bug 88523</a> - sha1.c:37: error: 'SHA1_CTX' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88561">Bug 88561</a> - [radeonsi][regression,bisected] Depth test/buffer issues in Portal</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88658">Bug 88658</a> - (bisected) Slow video playback on Kabini</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88662">Bug 88662</a> - unaligned access to gl_dlist_node</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88783">Bug 88783</a> - FTBFS: Clover: src/gallium/state_trackers/clover/llvm/invocation.cpp:335:49: error: no matching function for call to 'llvm::TargetLibraryInfo::TargetLibraryInfo(llvm::Triple)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88806">Bug 88806</a> - nir/nir_constant_expressions.c:2754:15: error: controlling expression type 'unsigned int' not compatible with any generic association type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88930">Bug 88930</a> - [osmesa] osbuffer->textures should be indexed by attachment type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88962">Bug 88962</a> - [osmesa] Crash on postprocessing if z buffer is NULL</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89068">Bug 89068</a> - glTexImage2D regression by texstore_rgba switch to _mesa_format_convert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86747">Bug 86747</a> - Noise in Football Manager 2014 textures</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86974">Bug 86974</a> - INTEL_DEBUG=shader_time always asserts in fs_generator::generate_code() when Mesa is built with --enable-debug (= with asserts)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88246">Bug 88246</a> - Commit 2881b12 causes 43 DrawElements test regressions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88883">Bug 88883</a> - ir-a2xx.c: variable changed in assert statement</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89292">Bug 89292</a> - [regression,bisected] incomplete screenshots in some cases</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=53077">Bug 53077</a> - [IVB] Output error with msaa when both of framebuffer and source color's alpha are not 1</li>
<li>Fix freedreno to compile with recent libdrm.</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-9.2.3..mesa-9.2.4
</pre>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: fix GL_FEEDBACK mode inverted Y coordinate bug</li>
</ul>
<p>Paul Berry (2):</p>
<ul>
<li>i965: Fix vertical alignment for multisampled buffers.</li>
<li>glsl: Fix lowering of direct assignment in lower_clip_distance.</li>
</ul>
<p>Rob Clark (17):</p>
<ul>
<li>freedreno/a3xx: fix color inversion on mem->gmem restore</li>
<li>freedreno/a3xx: fix viewport on gmem->mem resolve</li>
<li>freedreno: add debug option to disable scissor optimization</li>
<li>freedreno: update register headers</li>
<li>freedreno/a3xx: some texture fixes</li>
<li>freedreno/a3xx/compiler: fix CMP</li>
<li>freedreno/a3xx/compiler: handle saturate on dst</li>
<li>freedreno/a3xx/compiler: use max_reg rather than file_count</li>
<li>freedreno/a3xx/compiler: cat4 cannot use const reg as src</li>
<li>freedreno: fix segfault when no color buffer bound</li>
<li>freedreno/a3xx/compiler: make compiler errors more useful</li>
<li>freedreno/a3xx/compiler: bit of re-arrange/cleanup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62142">Bug 62142</a> - Mesa/demo mipmap_limits upside down with running by SOFTWARE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64323">Bug 64323</a> - Severe misrendering in Left 4 Dead 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66213">Bug 66213</a> - Certain Mesa Demos Rendering Inverted (vertically)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68838">Bug 68838</a> - GLSL: struct declarations produce a "empty declaration warning" in 9.2</li>
/* Legal values will be defined in layered extensions. */
cl_uintallocation_type;
/* Host cache policy for this external memory allocation. */
cl_uinthost_cache_policy;
}cl_mem_ext_host_ptr;
/*********************************
* cl_qcom_ion_host_ptr extension
*********************************/
#define CL_MEM_ION_HOST_PTR_QCOM 0x40A8
typedefstruct_cl_mem_ion_host_ptr
{
/* Type of external memory allocation. */
/* Must be CL_MEM_ION_HOST_PTR_QCOM for ION allocations. */
cl_mem_ext_host_ptrext_host_ptr;
/* ION file descriptor */
intion_filedesc;
/* Host pointer to the ION allocated memory */
void*ion_hostptr;
}cl_mem_ion_host_ptr;
#endif /* CL_VERSION_1_1 */
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.