An upcoming patch is going to introduce some code here, and having this code
organized as the patch does makes it a bit easier to read later.
There should be no functional change here.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
As it turns out, we were over-thinking the cause of the hang on
Cherryview. It's simply errata for Cherryview.
commit 88fea85f09
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Nov 21 10:47:41 2014 -0800
i965/vec4/gen8: Handle the MUL dest hazard exception
This is an explanation to why we never saw the hang on BDW.
NOTE: The problem the original patch was trying to fix does still exist. It will
have to be fixed at some point.
v2: Modify commit message, s/CHV/BDW
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've probably never seen this ridiculous pattern in the wild, so it
didn't matter.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This expects (0,0,0,0), though it can be changed to something else or allow
more than one set of values to be considered correct.
This is currently the radeonsi behavior.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Since alu does not support abs() modifier on source operands, spill
and apply the modifiers to a temp register when needed.
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
v2 (Ian Romanick)
- Move the check to the lexer before rallocing a copy of the large string.
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_vertex
dEQP-GLES3.functional.shaders.keywords.invalid_identifiers.max_length_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were incorrectly attributing VS time to FS8 on Gen8+, which now use
fs_visitor for vertex shaders.
We don't hit this for geometry shaders yet, but we may as well add
support now - the fix is obvious, and we'll just forget later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, we special cased FB writes and URB writes in the register
allocation code. What we really wanted was to handle any message with
EOT set.
This saves us from extending the list with new opcodes in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This helper function basically just checks inst->eot, but also asserts
that only opcodes we expect to terminate threads have EOT set. As far
as I'm aware, we've never had such a bug.
Removing it means that we don't have to extend the list for new opcodes.
Cherryview and Skylake introduce an optimization where sampler messages
can have EOT set; scalar GS/HS/DS will likely introduce new opcodes as
well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The latter currently implies CPU read access, so only PIPE_USAGE_STAGING
can be expected to be fast.
Mesa demos src/tests/streaming_rect on Kaveri (radeonsi):
Unpatched: 42 frames in 1.023 seconds = 41.056 FPS
Patched: 615 frames in 1.000 seconds = 615.000 FPS
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88658
Cc: "10.3 10.4" <mesa-stable@lists.freedestkop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Use a dummy vertex buffer object when vs inputs have no corresponding
entries in the vertex declaration. This dummy buffer will give to the
shader float4(0,0,0,0).
This fixes several artifacts on some games.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
The drm fd wasn't released, causing a crash
for wine tests on nouveau, which seems to have
a bug when a lot of device descriptors are open.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We weren't releasing hal and ref, causing some issues (threads not released, etc)
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When on a render node the unique ioctl doesn't work.
This patch drops the code to detect the device, which relied
on an ioctl, and replaces it by the mesa loader function.
The mesa loader function is more complete and won't fail for render-nodes.
Alternatively we could also have used the pipe cap to
determine the vendor and device id from the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This seems to be the behaviour on Win. Previous behaviour led
to different issues depending on the driver.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
If has_present_buffers was false at first,
but after a device reset, it turns true (for
example if we begin to render to a multisampled
back buffer), there was a crash due to present_buffers
being uninitialised.
This patch fixes it.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous code wasn't checking against the correct limit: 224
for sm3 hardware, but 256.
Fixes wine test test_pixel_shader_constant()
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Wine tests, and probably some apps, check for errors by checking for NULL
instead of error codes.
Fixes wine test test_surface_blocks()
Reviewed-by: Axel davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Add returncode E_FAIL.
Return E_FAIL for any vertexdeclaration element with type unused.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
ATOC is an hack for Alpha to coverage
that is supported by NV and Intel.
You need to check the support for it
with CheckDeviceFormat.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack is supposed to be supported
by all AMD SM2+ cards. Apps use it without
checking if they are on AMD.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This depth buffer format, like D3DFMT_INTZ, can be used to read
the depth buffer values when bound to a shader.
Some apps may use this format to get better performance when
they don't need the precision of INTZ (24 bits for depth, 8 for
stencil, whereas DF16 is just 16 bits for depth)
We don't add support for DF24 yet, because it implies support
for FETCH4, which we don't support for now.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some drivers support PIPE_FORMAT_S8_UINT_Z24_UNORM,
some others PIPE_FORMAT_Z24_UNORM_S8_UINT, some both.
It doesn't matter which one we use, since the d3d formats
they map to aren't lockable (app can read it directly).
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Move the checks of whether the format is supported
into a common place.
The advantage is that allows to handle when a d3d9
format can be mapped to several formats, and that
cards don't support all of them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The order of the format is changed to have
an increasing ordering of the d3d9 format values.
Some missing formats are added and matched to PIPE_FORMAT_NONE
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Because the debug output of this function was cut in two parts,
sometimes the second part wasn't print when we would return earlier,
whereas we would like to get it.
The reason of the separation was that it's only at the end of the function
we can print what we map to the d3d9 arguments, but we can always retrieve
that info by hand.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This D3D hack allows to resolve a multisampled
depth buffer into a single sampled one.
Note that the implementation is slightly incorrect.
When querying the content of D3DRS_POINTSIZE,
it should return the resz code if it has been set.
This behaviour will be implemented when state changes
will be reworked. For now the current behaviour is ok,
since apps use the D3DCREATE_PUREDEVICE flag when creating
the device, which means they won't read states and in exchange
get better performance.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This fixes a wine test and some minor visual issues on some games.
The patch is not optimal, there is probably a more efficient way to
fix this issue, but the code there already has some innefficiencies.
There is plans to rewrite that part of the code to make it more
efficient.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
D3DSP_NOSWIZZLE already contains the shift.
Detected with Clang.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This removes unneeded hack for Anno 1404.
This app is not checking the number of supporting
constants, and rely on the shader compilation to fail
if it puts too many constants.
This patch also checks for the correct number of constants for ps.
Note that we don't check the official limitations for old vs and ps
versions. The restrictions were fixed, unlike for the number of vertex
shader constants for later versions. Likely apps use the correct number,
and it's not a problem for us if it wants use more.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Instead of crashing on buggy shaders, we should return an error.
This patch introduces this behaviour in the case of invalid constant
access
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
r500 hasn't enough float constants for vs to fill all needs.
Overlapping issues can happen with complex shaders.
The fix would be to recompile shaders to include the integer
and boolean constants, instead of reserving slots for them.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previously 276 constants were declared everytime.
This patch makes shaders declare constants up to the maximum
constant needed and moves the moment we print the TGSI
shader after the moment we declare the constants.
This is needed for r500, since when indirect addressing is used,
it cannot reduce the amount of constants needed, and that it is
restricted to 256 constant slots.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Count explicitly the slots for float, int and bool constants,
and deduce the constbuf size in nine_shader.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch raises nine requirements and disables nine for old
hw that don't match them.
Currently for these cards only games that don't have tight requirements
would work well with nine. However nine is missing several checks
regarding these limitations.
To make code and future patches less heavy, dropping support for these old
card seems a good solution.
That makes r500 the only dx9 generation cards supported by nine. It seems the one
with the less limitations for nine. Still not everything is ok, and we'll have
for example to implement shader recompilation for these cards to include
integer and boolean constants in the shader.
Eventually when this is done, we can reintroduce support for older cards.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Resolving a multisampled depth texture into
a single sampled texture is supported on >= SM4.1
hw. It is possible some previous hw support it.
The ability was tested on radeonsi and nvc0.
Apparently is is also supported for radeon >= r700.
This patch adds the MULTISAMPLE_Z_RESOLVE cap and
add it to the drivers. It is advertised for drivers
for which it is sure the ability is supported.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Khronos modified glext.h to get rid of GL_TEXTURE_BINDING, a special enum
added for ARB_direct_state_access. This enum was ruled unimplementable.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laura Ekstrand <laura@jlekstrand.net>
Nothing special needs to be done.
Even though llvmpipe copies constant (ie uniform) buffers internally, the
application is supposed to flush and sync, so all should work.
All bufferstorage piglit tests pass.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As ffsll doesn't exist in MSVC yet, and u_bit_scan64 is only used by
radeonsi which is never built with MSVC.
This is just a stop-gap fix to unbreak MSVC build until we refactor these
mathematical portability wrappers into src/util.
Trivial.
We need a slot for the stipple texture and the pixel shader already uses
32 textures (16 API slots + 16 FMASK slots).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The hardware obeys swizzles even if the resource is NULL.
This will be used by set_polygon_stipple.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it.
Instead, use offset = 0, which is what we always do when not appending.
This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.
Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For drivers that use higher slots not to crash in tgsi_shader_info.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a rebase swizzle is provided and we call _mesa_swizzle_and_convert
after unpacking the source format we were always passing normalized=false.
We should pass true or false depending on the formats involved in the
conversion for the byte and float paths (the integer path cannot ever be
normalized).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Originally, get_alu_src was supposed to handle resolving swizzles and
things like that. However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore. The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form. We can handle those cases as a special case
easily enough. As a side-effect, we don't need the vec_to_movs pass
anymore.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we detect if we need an extra copy for swizzling. The
old code involved a pile of confusing switch fall-throughs; we now use a
loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we can scalarize with NIR, there's no need for all this code
anymore. Let's get rid of it and just do scalar operations.
v2: run copy prop before lowering phi nodes
v3: Get rid of the "emit(...)->saturate = foo" pattern
v4: Run alu_to_scalar as an optimization pass
total instructions in shared programs: 5998321 -> 5974070 (-0.40%)
instructions in affected programs: 732075 -> 707824 (-3.31%)
helped: 3137
HURT: 191
GAINED: 18
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add better comments
- Use nir_ssa_dest_init and nir_src_for_ssa more places
- Fix some void * casts
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Rework the way we determine whether or not to sccalarize a phi node to
make the recursion non-bogus
- Treat load_const instructions as scalarizable
v4 Jason Ekstrand <jason.ekstrand@intel.com>:
- Allow uniform and input loads to be scalarizable
v5 Jason Ekstrand <jason.ekstrand@intel.com>:
- Also consider loads of inputs (varying, uniform, or ubo) to be
scalarizable. We were already doing this for load_var on uniforms and
inputs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All but 16 of the programs helped were ARB fp programs.
total instructions in shared programs: 5949286 -> 5945470 (-0.06%)
instructions in affected programs: 275162 -> 271346 (-1.39%)
helped: 1197
GAINED: 1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:
"If a negative number is provided where an argument of type sizei or
sizeiptr is specified, an INVALID_VALUE error is generated.
This patch adds checks for negative buffer size values passed to different APIs.
It also moves up the check on other APIs that already had it, making it the first
error check performed in the function, for consistency.
While there may be other APIs throughtout the code lacking this check (or at least
not at the beginning of the function), this patch focuses on the cases that break
the dEQP tests listed below. It could be a good excersize for the future to check
all other cases, and improve consistency in the order of the checks throughout the
whole Mesa code base.
This fixes 5 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_attached_shaders
* dEQP-GLES3.functional.negative_api.state.get_shader_source
* dEQP-GLES3.functional.negative_api.state.get_active_uniform
* dEQP-GLES3.functional.negative_api.state.get_active_attrib
* dEQP-GLES3.functional.negative_api.shader.program_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec:
"If the default framebuffer is bound to target, then attachment must be
BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or
STENCIL, identifying the stencil buffer."
OpenGL ES 3.0, section 2.5 (GL Errors):
"If a command that requires an enumerated value is passed a
symbolic constant that is not one of those specified as allowable
for that command, an INVALID_ENUM error is generated."
Then change the returned error to INVALID_ENUM.
Fixes:
dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:
Operation Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750) 0.0000000447
mod(121.57, 13.29) 0.0000023842
mod(3769.12, 321.99) 0.0000762939
mod(3769.12, 1321.99) 0.0001220703
mod(-987654.125, 123456.984375) 0.0160663128
mod( 987654.125, 123456.984375) 0.0312500000
This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:
mod(x,y) = x - y * floor(x/y)
This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.
v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.
Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX
(2.8.1 Transferring Array Elements, page 26) which is not currently
possible to query using glGet*() funcs.
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For code such as:
uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);
We produce code like:
mov(8) g5<1>.xF -g9<4,4,1>.xUD
which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.
It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.
This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.
Fixes the following 24 dEQP tests:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Nothing enables the extension yet, but the values are now available.
The spec calls for it to only be exposed for GL 3.3+, which is core-only
in mesa. Instead we allow any driver to enable it, including in a compat
context for any GL version.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.
Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.
No changes in shader-db, probably because we usually optimize this later
anyway. But it does make us generate less stupid code up front.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Paul originally had to reverse engineer these formulas based on the
description about how the sampler works. The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.
Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels. If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user. Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations. After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.
This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment. On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.
The solution is to add a new _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.
The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).
The gears demo and others hit this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The intrinsics are universally available, whereas older Windows SDKs (e.g.
7.0.7600) don't have the non-intrisic entrypoint.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.
This fixes the fbo-blending-formats Piglit test and possibly others.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When color buffers alone are concerned the depth is not needed.
No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.
This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.
v4: take assert away and make valid_filter_for_float conditional (Tapani),
fix the alphabetical order (Emil)
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list(). This is
apparently relied on by r300g.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function. So, for now, we'll
just disable it until we actually have a C++ user.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
[ Francisco Jerez: As discussed on the mailing list, this is intended
to produce more useful debug output in cases where the compilation
terminates unexpectedly. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
[ Francisco Jerez: As we're at it make debug_options[] local to its
only user and remove temporary. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.
On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.
This is a first pass attempt at this, and passes the piglit
tests that test for this.
v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.
On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Refactor to make the functions look more like the old
intel_tex_subimage_tiled_memcpy
- Don't export the readpixels_tiled_memcpy function
- Fix some pointer arithmatic bugs in partial image downloads (using
ReadPixels with a non-zero x or y offset)
- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
miplevel other than zero.
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Better documentation fot the *_tiled_memcpy functions
- Add target restrictions for renderbuffers wrapping textures
v4: Jason Ekstrand <jason.ekstrand@intel.com>
- Only check the return value of brw_bo_map for error and not bo->virtual
v5: Jason Ekstrand <jason.ekstrand@intel.com>
- Don't unnecessarily repeat a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit addes tiled copy functions for coping from tiled memory to
linear memory. These are very similar to the existing linear-to-tiled
paths.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- New commit message
- Various whitespace fixes
- Added ptrdiff_t casts as done in commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit 225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.
Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value). However, we wouldn't
delete CMP.nz instructions that served the same purpose.
We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.
This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.
No changes in shader-db without NIR. With NIR,
total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs: 2087139 -> 2050350 (-1.76%)
helped: 10704
HURT: 0
GAINED: 2
LOST: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
GAINED: 103
LOST: 43
Causes one program to spill (643 -> 1076 instructions).
I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.
No piglit regressions on any i965 platform in Jenkins.
total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0
No other platforms affected in shader-db.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.
v2: Just pass a NULL state, now that it won't crash when you do.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.
v2: Use the nir_src_for_ssa() helper, and another instance of
nir_alu_src_copy().
v3: Drop the non-SSA support. All intended callers will have SSA-only ALU
ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
about weird input_sizes[].
Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
There aren't many users yet, but I wanted to do this from my scalarizing
pass.
v2: Constify the src arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Most of these exist in the GLSL IR algebraic pass already. However,
SSA allows us to find more instances of the patterns.
total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs: 124189 -> 120026 (-3.35%)
helped: 604
total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs: 261295 -> 254507 (-2.60%)
helped: 1295
HURT: 3
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.
The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.
total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs: 411143 -> 405922 (-1.27%)
helped: 2233
HURT: 214
A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x. We can always clean that up later.
total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs: 784980 -> 775093 (-1.26%)
helped: 4508
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The GLSL IR optimization pass contained these; we may as well include
them too.
v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).
No change in the number of NIR instructions on a shader-db run.
total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that. While there,
I decided to add a bunch more of them.
v2: Delete bogus fand/for optimizations (caught by Jason).
total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs: 149634 -> 146937 (-1.80%)
helped: 1032
total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs: 537 -> 542 (0.93%)
HURT: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value. This
made ALU operations on those values appear distinct as well.
Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.
Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered. I didn't implement
support for variables for the time being.
v2: Assert that info->num_variables == 0 (requested by Jason).
total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs: 2413496 -> 2001071 (-17.09%)
helped: 16872
total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs: 640654 -> 620094 (-3.21%)
helped: 2071
HURT: 585
GAINED: 14
LOST: 25
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.
The next patch will introduce another case that requires SSA.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to count NIR instructions via shader-db.
Use "run" as normal. The results file will contain both NIR and
assembly.
Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)
Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is useful for debugging and looking for optimization opportunities.
It will need to be expanded when we add support for other scalar stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.
total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.
Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.
On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message. Since we only do this for >= HSW,
this is ok since there are no MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space. The usual candidate was
the destination of the sampler instruction. However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data. Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previous code semantic was:
. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.
Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.
Drivers like r300 dislike when vs has inputs that are not fed.
Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.
The new code semantic is:
. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage
The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.
This fixes 3Dmark05, which uses ff vs with programmable ps.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.
This patch makes this buffer be allocated once.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.
The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
texcoord for ps < 1_4 should clamp between 0 and 1 the values.
texcrd (texcoord ps 1_4) does not clamp and can be used with
two modifiers _dw and _dz that means the channels are divided
by w or z.
Implement those in shared code, since the same modifiers can be used
for texld ps 1_4.
v2: replace DIV by RCP + MUL
v3: Remove an useless MOV
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Nothing seems to indicates the negation modifier would be stored in the
instruction flags instead of the source modifier. tx_src_param has
already handled it if it is in the source modifier.
In addition,
when the card supports native integers, the boolean
are stored in 32 bits int and are equal to
0 or 0xFFFFFFFF.
Given 0xFFFFFFFF is NaN if it was a float, better use
UIF than IF.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation was behaving fine, but improve it by:
. Improved documentation
. Decreasing counter (comparing to 0 is likely to be faster than to constant)
. Move the counter update at the end for better performance for shaders that
break the loop earlier than when the count is done.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Previous implementation didn't work well with nested loops.
Instead of using several address registers, put a0 and aL
into normal registers, and copy them to one address register when
we need to use them.
Wine tests loop_index_test() and nested_loop_test() now pass correctly.
Fixes r600g crash while loading Bioshock -
bug https://bugs.freedesktop.org/show_bug.cgi?id=85696
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
When the input's xyz are 0.0, the output
should be 0.0. This is due to the fact that
Inf * 0 = 0 for dx9. To handle this case,
cap the result of RSQ to FLT_MAX. We have
FLT_MAX * 0 = 0.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We should use the absolute value of the input as input to ureg_RSQ.
Moreover, an input of 0.0 should return FLT_MAX.
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Let's say we have c1 and c2 declared in the shader and c0 given by the app
Then here we would have read c0, c1 and c2 given by the app, instead
of the correct c0, c1, c2.
This correction fixes several issues in some games.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Convert them to shader booleans at earlier stage.
Previous code is fine, but later patch will make
integers being converted at earlier stage, so do
the same for booleans
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Buffers in the MANAGED pool are supposed to have the content in a ram buffer,
a copy in VRAM if there is enough memory (driver manages memory and decide when
to delete the buffer in VRAM).
This is not implemented properly in nine, and a VRAM copy is going to be created
when the RAM memory is filled, and the VRAM copy will get synced with the RAM
memory updates.
Due to some issues (in the implementation or in app logic), it can happen
we try to create a sampler view of the resource while we haven't created the
VRAM resource. This hack creates the resource when we hit this case, which prevents
crashing, but doesn't help with the resource content.
This fixes several games crashing at launch.
Acked-by: Axel Davy <axel.davy@ens.fr>
Acked-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Stanislaw Halik <sthalik@misaki.pl>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
While previous code was having the correct behaviour in general,
this new code is more readable (without checking all gallium formats
manually) and has a more defined behaviour for depth stencil resources.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The implicit swapchains are destroyed when the device instance is
destroyed. However for non-implicit swapchains, it is not the case,
and the application can have kept an reference on the swapchain
buffers to reuse them.
Fixes problems with battle.net launcher.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This->surfaces contains the surfaces associated to the levels
and faces. This->surfaces[6*Level] is what we want here,
since it gives us a face descriptor for the level 'Level'.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The cap means D3DFVF_XYZRHW vertices will see clipping.
This is not the case when
PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION is supported, since
it'll disable clipping.
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The clip state was reset everytime, incurring an overhead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value. But once we start doing vector bools in a few commits,
we end up getting bad values.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do. Just provide the
simpler helper, instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Mac OS X XQuartz places X11 headers at /opt/X11/include.
This patch fixes this Mac OS X SCons build error.
Compiling src/gallium/state_trackers/glx/xlib/glx_api.c ...
In file included from src/gallium/state_trackers/glx/xlib/glx_api.c:34:
include/GL/glx.h:30:10: fatal error: 'X11/Xlib.h' file not found
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Since the meta path can do strictly more than the blitter path, we just
remove the blitter path entirely.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This meta path, designed for use with PBO's, creates a temporary texture
out of the PBO and uses BlitFramebuffers to do the actual texture upload.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Add support for handling simple packing options
v3 Jason Ekstrand <jason.ekstrand@intel.com>:
- Refactor to split out the texture-from-pbo code
- Rename to _mesa_meta_pbo_TexSubImage
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Going through the for loop every time has noticable overhead. This fixes
things up so we only do that once ever and then just do a hash table lookup
which should be much cheaper.
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use once_flag and call_once from c11/threads.h instead of pthreads
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Previously, we were completely ignoring the mt->offset field for
renderbuffers. While it does have some alignment constraints, it is valid
to use it. This patch adds the code to each of the 4 surface state setup
functions to handle it.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Fixes currently failing Piglit case
interface-blocks-name-reused-globally.vert
v2: combine var declaration with assignment (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6. With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
This fixes two problems reported by osc:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
The below code crashes when vector_elements <= 0
Fixes Warray-bounds warnings
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently 2 users of this functionality. I have 2 more users coming
up, and having a simple function makes the results much cleaner. The existing
interface semantics was proposed by Matt.
v2 (Ken): Rename to region_matches()/has_scalar_region().
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GEN8 added the QWORD as a valid type for certain operations on the EU.
In order to calculate the number of registers used one must have the type
size as part of the equation. Quoting the formula in the code:
regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;
Adding this separately for bisection since there is no simple way to add
an assert in the type_sz function.
NOTE: As a side note, I was confused for a while because it's impossible
to calculate the region, ie. registers needed, without vstride. However,
at this point these are all part of the IR, and so no vstride must exist.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.
v2:
- Make sure not to break older LLVM.
Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'. Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two. Either draw by itself works
fine, but together, they hang the GPU. Removing the glUniform call
makes the hangs disappear. In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear. I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).
I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further. We have no real tools,
and the hardware people moved on years ago. I've analyzed 20+ error
states and read every scrap of documentation I could find.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
offset() properly handles reg_width, so it'll work for SIMD16.
While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_fs_nir.cpp creates almost all of its registers via:
fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));
When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.
This patch replaces that pattern with a new "vgrf" helper method:
fs_reg reg = vgrf(num_components);
The new function correctly takes reg_width into account. For now,
reg_width is always 1, so this should have no functional change.
v2: Just make vgrf() account for reg_width right away, rather than
changing the behavior in the next patch.
v3: Replace one last virtual_grf_alloc I missed. It's used in code
that only runs for dispatch_width == 8, so it doesn't matter,
but consistency is nice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler. Which is sensible - a class that represents
a register should do just that. Allocating virtual register numbers
should be left up to the compiler (fs_visitor).
This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor. It ends up being no
more code.
v2: Rebase from May 2014 -> January 2015.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The filename of sha1.h was conflicting with the system-provided
sha1.h, (and in some confiurations, our sha1.c was unsuccessfully
attemping to include "sha1.h" and <sha1.h> as two different files).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88523
Commit 8ec6534 changed texture upload path and the way how texture
format is being checked, this commit adds support for GL_RGB with
GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension
EXT_texture_type_2_10_10_10_REV specification.
This fixes regression in ES3 conformance test
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We hit an assertion that the destination of the FB write should not be
an immediate. (I don't know what we were thinking.) Use ARF null.
Trying to substitute real shaders with the dummy shader would crash
when trying to upload non-existent uniforms. Say there are none.
It also wouldn't generate any code because we didn't compute the CFG,
and code generation now requires it. Compute it.
Gen4-5 also require a message header to be present.
On Gen6+, there were assertion failures in SF/SBE state because
urb_setup was memset to 0 instad of -1, causing it to think there were
attributes when nothing was set up right. Set to no attributes.
Finally, you have to ensure "Setup URB Entry Read Length" is non-zero
or you get GPU hangs, at least on Crestline.
It now works on at least Crestline and Haswell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
glibc 2.19 introduced _DEFUAULT_SOURCE as a replacement for _BSD_SOURCE,
and deprecates _BSD_SOURCE with an annoying warning. Defining both is
how you're supposed to transition so let's do that. It gets rid of the
warning and we can figure out when/if we can drop _BSD_SOURCE later.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Fix build error.
CC libmesautil_la-sha1.lo
sha1.c: In function '_mesa_sha1_final':
sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)
gcry_md_hd_t h = (grcy_md_hd_t) ctx;
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88519
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The driver name is no longer const, it's always allocated dynamically
one way or another. Drop const from dri_screen_create_dri2
driver_name argument to avoid warning.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We don't actually have the code for the shader cache just yet, but
this configure machinery puts everything in place so that the shader
cache can be optionally compiled in.
Specifically, if the user passes no option (neither
--disable-shader-cache, nor --enable-shader-cache), then this feature
will be automatically detected based on the presence of a usable SHA-1
library. If no suitable library can be found, then the shader cache
will be automatically disabled, (and reported in the final output from
configure).
The user can force the shader-cache feature to not be compiled, (even
if a SHA-1 library is detected), by passing
--disable-shader-cache. This will prevent the compiled Mesa libraries
from depending on any library for SHA-1 implementation.
Finally, the user can also force the shader cache on with
--enable-shader-cache. This will cause configure to trigger a fatal
error if no sutiable SHA-1 implementation can be found for the
shader-cache feature.
Bug fix by José Fonseca <jfonseca@vmware.com>: Fix to put conditional
assignment in Makefile.am, not Makefile.sources to avoid breaking
scons build.
Note: As recommended by José, with this commit the scons build will
not compile any of the SHA-1-using code. This is waiting for someone
to write SConstruct detection of the available SHA-1 libraries, (and
set the appropriate HAVE_SHA1_* variables).
Reviewed-by: Matt Turner <mattst88@gmail.com>
The upcoming shader cache uses the SHA-1 algorithm for cryptographic
naming. These new mesa_sha1 functions are implemented with any one of
several differeny cryptographics libraries.
This code was copied from the xserver repository, (where it has
apparently been functioning well on a variety of operating systems),
and comes licensed with a license identical to that of Mesa.
Bug fixes by José Fonseca <jfonseca@vmware.com>: Fix to put
conditional assignment in Makefile.am, not Makefile.sources to avoid
breaking scons build. Fix include file for CryptoAPI section. Fix
missing cast in openssl section.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Prior to copying in code from the xserver configure.ac file, it makes
sense to have the license of this file clearly marked, (to show that
it's licensed identically to the configure.ac file from the xserver
repository).
And since the text of the license refers to "the above copyright
notice" it also makes sense to have an actual copyright attribution in
place.
I generated this list of names by looking at the output of:
git shortlog -n --format=%aD -- configure.ac
(and arbitrarily stopping for contributors with fewer than 15
commits). Then for each name, I looked for existing Copyright
attributions in the mesa source tree with the same name, (and using
"Intel Corporation" as the copyright holder where I knew that was
appropriate).
In addition to exercising all of the functions in blob.h, this
includes a stress test that forces some reallocing, and also tests to
verify the alignment and overrun-detection code in blob.c.
These functions are useful when serializing an unknown number of items
to a blob. The caller can first save the current offset, write a
placeholder uint32, write out (and count) the items, then use
blob_overwrite_uint32 with the saved offset to replace the placeholder
value.
Then, when deserializing, the reader will first read the count and
know how many subsequent items to expect.
(I wrote this code after reading a very similar patch written by
Tapani when he wrote serialization code for IR. Since I re-used the
idea of his code so directly, I've credited him as the author of this
code. --Carl)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).
There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The upcoming shader cache needs this to be able to cache hash data
from the gl_shader_program structure.
Edited-by: Carl Worth <cworth@cworth.org>:
There is an internal implementation detail that the hash table
underlying the struct string_to_uint_map stores each value internally
as (value+1). The user needn't be very concerned with this (other than
knowing that a value of UINT_MAX cannot be stored) since put() adds 1
and get() subtracts 1.
So in this commit, rather than call the user's function directly with
hash_table_call_foreach, we call through a wrapper that fixes up the
off-by-one values before the caller's callback sees them.
And with this wrapper in place, we also give a better signature to the
callback function being passed to iterate(), so that this callback
function can actually expect a char* and an unsigned argument, (rather
than a couple of void* ).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously, if __builtin_unreachable() was unavailable, the
unreachable macro was defined to do nothing. We do better here, by at
least still making it an assert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is similar to the existing functions get_instance,
get_array_instance, etc. for getting a type singleton. The new
get_sampler_instance() function will be used by the upcoming shader
cache.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we generated this for FB writes in SIMD16 mode:
load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F
fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf
The LOAD_PAYLOAD's destination had its register width set to 8, and the
FB_WRITE had its execution size set to 8. This seems wrong, and while
it probably doesn't affect anything, we should fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:
x * 32 / 256
This works as if it's taking a value from a range where 256 represents 1.0 and
scaling it down to a range where 32 represents 1.0. However this is not
correct because it is actually 255 and 31 that represent 1.0.
We can do better with a formula like this:
(x * 31 + 127) / 255
The +127 is to make it round correctly.
The new code has a special case to use uint64_t when the result of the
multiplication would overflow an unsigned int. This function is inline and
only ever called with constant values so hopefully the if statements will be
folded.
The main incentive to do this is to make the CPU conversion path pick the same
values as the hardware would if it did the conversion. This fixes failures
with the ‘texsubimage pbo’ test when using the patches from here:
http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html
v2: Use 64-bit arithmetic when src_bits+dst_bits > 32
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is a rework of the liveness algorithm using a worklist as suggested by
Connor. Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop. Also, the stuff after the last loop
in the funciton will only ever get visited once.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
A worklist is a common concept in optimizations. This adds a structure
that we can reuse for many different types of optimizations.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Changes the initial internal format of a render buffer
to GL_RGBA4 in GLES 3. This fixes a failure in the following
DrawElements test:
dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously, the set API required the user to do all of the hashing of keys
as it passed them in. Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it. Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names. This is
especially bad when the standard call looks something like
_mesa_set_add(ht, _mesa_pointer_hash(key), key);
In the above case, there is no reason why the hash set shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
This is analygous to 94303a0750 where we did this for hash_table
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We already have search_pre_hashed. This makes the APIs match better.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:
cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
...
cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f
we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.
Don't do anything if the entry's destination is null and the
instruction's destination is non-null.
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Just use the abs source modifier on both of the multiplicand
arguments.
instructions in affected programs: 300 -> 296 (-1.33%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Just use the negation source modifier on one of the multiplicand
arguments.
total instructions in shared programs: 5889529 -> 5880016 (-0.16%)
instructions in affected programs: 600846 -> 591333 (-1.58%)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Without the break, it was possible that an instruction would match multiple
expressions. If this happened, you could end up trying to replace it
multiple times and get a segfault. This makes it so that, after a
successful replacement, it moves on to the next instruction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This refactor allows you to more easily get the deref node associated with
a given variable. We then use that new functionality in the
deref_may_be_aliased function instead of creating a 1-element deref chain.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The original name wasn't particularly descriptive. This one indicates that
it actually gives you SSA values as opposed to the old pass which lowered
variables to registers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This solves a number of problems. First is the ability to change the
number of sources that a texture instruction has. Second, it solves the
delema that may occur if a texture instruction has more than 4 sources.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Additional description was added to a variety of places. Also, we no
longer use the term "leaf" to describe fully-qualified direct derefs.
Instead, we simply use the term "direct" or spell it out completely.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should be much better for debugging as GDB will pick up on the fact
that it's an enum and actually tell you what you're looking at instead of
giving you some arbitrary hex value you have to go look up.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This should make debugging a lot easier as GDB handles static inlines much
better than macros. Also, static inlines are typesafe.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This was a left-over relic of GLSL IR that we aren't using for anything.
If we ever want that value again, we can add it back, but NIR constant
folding should be just as good as GLSL IR's if not better pretty soon, so
I'm not worried about it.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, our variable renaming algorithm, while similar to the one in
the Cytron paper, was not the same. While I'm pretty sure it was correct,
it will be easier for readers of the code in the variable renaming pass if
it follows more closely. This commit removes the automatic stack popping
we were doing and replaces it with explicit popping like Cytron does.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit seeks to make the lower_variables pass much more clear by
adding a pile of comments and re-arranging a few things. There are no
functional or algorithmic changes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
parallel_copy_copy was a silly name. Also, things were getting long and
annoying, so I added a foreach macro. For historical reasons, several of
the original iterations over parallel copy entries in from_ssa used the
_safe variants of the loop. However, all of these no longer ever remove an
entry so it's ok to make them all use the normal iterator.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were doing a lazy creation of the parallel copy
instructions. This is confusing, hard to get right, and involves some
extra state tracking of the copies. This commit adds an extra walk over
the basic blocks to add the block-end parallel copies up front. This
should be much less confusing and, consequently, easier to get right. This
commit also adds more comments about parallel copies to help explain what
all is going on.
As a consequence of these changes, we can now remove the at_end parameter
from nir_parallel_copy_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were emitting the full pile of setup instructions for sample_id
and sample_pos every time they were used. With this commit, we emit them
in their own pass once at the beginning of the shader and simply emit uses
later on. When it comes time for setting up VS, we can put setup for its
special values in the same pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Originally, this field was intended for determining if the given
instruction acted per-component or if it had mismatching source and
destination sizes that would have to be interpreted specially. However, we
can easily derive this from output_size == 0, so it's not really that
useful. Also, the values we were setting in nir_opcodes.h for this field
were completely bogus and it was never used.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Prior to this commit, we had a big switch statement for this. Now it's
baked into the opcode metadata so we can just use that.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit adds some algebraic properties to the metadata of each opcode
in NIR. In particular, you now know, just from the metadata, if a given
opcode is commutative or associative. This will be useful for algebraic
transformation passes that want to be able to match a + b as well as b + a
in one go.
v2: Make algebraic properties all caps. This was more consistent with the
intrinsics flags and seems better for flags in general.
Also, the enums are now declared with (1 << n) rather then hex values.
v3: fmin and fmax technically aren't commutative or associative. Things
get funny when one of the arguments is a NaN.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
As it was, we weren't ever using load_const in a non-SSA way. This allows
us to substantially simplify the load_const instruction. If we ever need a
non-SSA constant load, we can do a load_const and an imov.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, lower_atomics was non-SSA only. We assert-failed if the
destination of an atomic operation intrinsic was an SSA def and we used
temporary registers for computing offsets. This commit changes both of
these behaviors. We now use SSA values for computing offsets (so we can
optimize them) and we handle SSA destinations. We also move the pass to
run before we go out of SSA on i965 as it now generates SSA values.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Before, we were using foreach_dest and switching on whether the destination
was an SSA value. This works, except not all destinations are SSA values
so we have to special-case ssa_undef instructions. Now that we have a
foreach_ssa_def function, we can iterate over all of the register
destinations in one pass and iterate over the SSA destinations in a second.
This way, if we add other ssa-only instructions, we won't have to worry
about adding them to the special case we have for ssa_undef.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
There are some functions whose destinations are SSA-only and so aren't a
nir_dest. This provides a function that is capable of iterating over the
SSA definitions defined by those functions. If you want registers, you
should use the old iterator.
v2: Kenneth Graunke <kenneth@whitecape.org>:
- Fix nir_foreach_ssa_def's return value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we were just iterating over the program "in order" which
kind-of approximates a DFS, but not really. In particular, we got the
following case wrong:
loop {
a = 3;
if (foo) {
a = 5;
} else {
break;
}
use(a);
}
where use(a) would get 3 instead of 5 because of premature popping of the
SSA def stack. Now, since we do an actaul DFS, we should evaluate use(a)
immediately after a = 5 and we should be ok.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We stopped generating predicates in glsl_to_nir some time ago. Right now,
it's all dead untested code that I'm not convinced always worked in the
first place. If we decide we want them back, we can revert this patch.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the condition was a scalar that applied to all components
simultaneously. As of this commit, the condition is a vector and each
component is switched seperately.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
nir_metadata_dirty was a terrible name because the parameter it takes is
the metadata to be preserved. This is really confusing because it looks
like it's doing the opposite of what it is actually doing. Now it's named
sensibly.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2 Jason Ekstrand <jason.ekstrand@intel.com>:
- Use the nir_tex_src_sampler_offset source type instead of the
sampler_indirect thing that I cooked up before.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In particular, we rename nir_tex_src_sampler_index to _sampler_offset and
add a sampler_array_size field to nir_tex_instr. This way we can pass the
size of sampler arrays through to backends even after removing the variable
information and, with it, the type.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In GLSL-to-NIR we were just setting the base index to 0 whenever there was
an indirect so having it expressed as a sum makes no sense. Also, while a
base offset may make sense for the memory location (first element in the
array, etc.) it makes less sense for the actual uniform buffer index. This
may change later, but it seems to make more sense for now.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit renames nir_instr_as_texture to nir_instr_as_tex and renames
nir_instr_type_texture to nir_instr_type_tex to be consistent with
nir_tex_instr.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass uses the previously built algebraic transformations framework and
should act as an example for anyone else wanting to make an algebraic
transformation pass for NIR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit builds on the nir_search.h infastructure by adding a bit of
python code that makes it stupid easy to write an algebraic transformation
pass. The nir_algebraic.py file contains four python classes that
correspond directly to the datastructures in nir_search.c and allow you to
easily generate the C code to represent them. Given a list of
search-and-replace operations, it can then generate a function that applies
those transformations to a shader.
The transformations can be specified manually, or they can be specified
using nested tuples. The nested tuples make a neat little language for
specifying expression trees and search-and-replace operations in a very
readable and easy-to-edit fasion.
The generated code is also fairly efficient. Insteady of blindly calling
nir_replace_instr with every single transformation and on every single
instruction, it uses a switch statement on the instruction opcode to do a
first-order culling and only calls nir_replace_instr if the opcode is known
to match the first opcode in the search expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This framework provides a simple way to do simple search-and-replace
operations on NIR code. The nir_search.h header provides four simple data
structures for representing expressions: nir_value and four subtypes:
nir_variable, nir_constant, and nir_expression. An expression tree can
then be represented by nesting these data structures as needed. The
nir_replace_instr function takes an instruction, an expression, and a
value; if the instruction matches the expression, it is replaced with a new
chain of instructions to generate the given replacement value. The
framework keeps track of swizzles on sources and automatically generates
the currect swizzles for the replacement value.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, the casting operations were macros. While this is usually
fine, the casting macro used the input parameter twice leading to strange
behavior when you passed the result of another function into it. Since we
know the source and destination types explicitly, we don't loose anything
by making it a function.
Also, this gives us a nice little macro for creating cast function that
will hopefully prevent mistyping.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We used to have the number of components built into the intrinsic. This
meant that all of our load/store intrinsics had vec1, vec2, vec3, and vec4
variants. This lead to piles of switch statements to generate the correct
intrinsic names, and introspection to figure out the number of components.
We can make things much nicer by allowing "vectorized" intrinsics.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This commit switches us over to the new variable lowering code which is
capable of properly handling lowering indirects as we go.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
With this commit, the GLSL IR -> NIR pass generates NIR in more-or-less SSA
form. It's SSA in the sense that it doesn't have any registers, but it
isn't really useful SSA because it still has a pile of load/store
intrinsics that we will need to get rid of.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass analizes all of the load/store operations and, when a variable is
never aliased (potentially used by an indirect operation), it is lowered
directly to an SSA value. This pass translates to SSA directly and does
not require any fixup by the original to-SSA pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead, we give SSA definitions a temporary index of 0xFFFFFFFF if the
instruction does not have a block and a proper index when it actually gets
added to the list.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used a string name. It was nice for translating out of GLSL
IR (which also does that) but cumbersome the rest of the time.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Backends want to be able to do special things with constant values such as
put them into immediates or make decisions based on whether or not a value
is constant. Before, constants always got lowered to a load_const into a
register and then a register use. Now we leave constants as SSA values so
backends can special-case them if they want. Since handling constant SSA
values is trivial, this shouldn't be a problem for backends.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is still fairly basic. It only handles ALU operations, constant
loads, and phi nodes. No texture ops or intrinsics yet.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The unlink_blocks function moves successors around to make sure that, if
there is a remaining successor, it is in the first successors slot and not
the second. To fix this, we simply get both successors up front.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We also make the return types match GLSL. The GLSL spec specifies that
findMSB and findLSB return a signed integer. Previously, nir had them
return unsigned. This updates nir's behavior to match what GLSL expects.
We also update the nir-to-fs generator to take the new instructions. While
we're at it, we fix the case where the input to findMSB is zero.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
These indices should now be reasonably stable/consistent. Redoing the
indices in the print functions makes it harder to debug problems.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Some time while refactoring things to make it look nicer before pushing to
master, I completely broke the function. This fixes it to be correct.
Just goes to show you why you souldn't push code that has no users yet...
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This commit rewrites the out-of-SSA pass to not be nearly as naieve. It's
based on "Revisiting Out-of-SSA Translation for Correctness, Code Quality,
and Efficiency" by Boissinot et. al. It should be fairly close to
state-of-the art.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Since we don't actually have an "if" instruction, this is a very common
pattern when iterating over instructions. This adds a helper function for
it to make things a little less painful.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This pass is kind of stupidly implemented but it should be enough to get us
up and going. We probably want something better that doesn't generate all
of the redundant moves eventually. However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.
Previously, emit_general_interpolation took an ir_variable and pulled the
information it needed from that. This meant that in fs_fp, we were
constructing a dummy ir_variable just to pass into it. This commit makes
emit_general_interpolation take only the information it needs and gets rid
of the fs_fp cruft.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Make brw_fs_nir build again
Only use NIR of INTEL_USE_NIR is set
whitespace fixes
These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
whitespace and automake fixes
This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.
v2: Jason Ekstrand <jason.ekstrand@intel.com>:
Include ralloc and hash_table from the util directory
whitespace fixes
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
This reverts commit 0543630d0b.
It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.
We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.
Sorry I didn't remember this when reviewing the reverted change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
You would not believe the mess GCC 4.8.3 generated for the old
switch-statement.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)
The regression on 32-bit is odd. Callgrind says the caller,
_mesa_is_valid_prim_mode is faster. Before it says 2,293,760
cycles, and after it says 917,504.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Multithread:
32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40)
64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40)
Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on
32-bit or 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous check was insufficient (as it did not take 'indices' into
consideration), and DX10 hardware does not need this check anyway.
Since index_bytes is no longer used, remove it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40)
64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40)
The regression on 64-bit is odd. Callgrind says the caller,
validate_DrawElements_common is faster. Before it says 10,321,920
cycles, and after it says 8,945,664.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This doesn't affect performance, but it feels more correct.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: No difference proven at 95.0% confidence (n=120)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of having an extra pointer indirection in one of the hottest
loops in the driver.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
v2 (Ken): Cut size of array from 64 to 57 to save memory.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the switch-statement, GCC 4.8.3 produces a small pile of code with
a branch.
00000000 <brw_get_index_type>:
000000: 8b 54 24 04 mov 0x4(%esp),%edx
000004: b8 01 00 00 00 mov $0x1,%eax
000009: 81 fa 03 14 00 00 cmp $0x1403,%edx
00000f: 74 0d je 00001e <brw_get_index_type+0x1e>
000011: 31 c0 xor %eax,%eax
000013: 81 fa 05 14 00 00 cmp $0x1405,%edx
000019: 0f 94 c0 sete %al
00001c: 01 c0 add %eax,%eax
00001e: c3 ret
However, this could be two instructions.
00000000 <brw_get_index_type>:
000000: 2d 01 14 00 00 sub $0x1401,%eax
000005: d1 e8 shr %eax
000007: 90 nop
000008: 90 nop
000009: 90 nop
00000a: 90 nop
00000b: c3 ret
The function was also moved to the header so that it could be inlined at
the two call sites. Without this, 32-bit also needs to pull the
parameter from the stack. This means there is a push, a call, a move,
and a ret added to a two instruction function. The above code shows the
function with __attribute__((regparm=1)), but even this adds several
extra instructions. There is also an extra instruction on 64-bit to
move the parameter to %eax for the subtract.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
...so that it can be inlined in the two places that call it.
On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:
32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were happily printing "Native code for unnamed vertex shader" and
"VS vec4" program for geometry shaders in our INTEL_DEBUG=gs output,
as well as the KHR_debug output used by shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A lot of messages hardcoded the string "FS", which is confusing on
Broadwell, where we use this code for VS support as well.
shader-db particularly got confused, as it reported two "FS SIMD8"
shaders, and no vertex shaders at all. Craziness ensued.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only GNU indent is supported when indenting autogenerated format_pack.c
and format_unpack.c files. Some non-GNU indent (Mac OS X and FreeBSD)
add extra whitespaces than break the build of those files.
Fallback to 'cat' if a non-GNU indent is found.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88335
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The 8888 suggests 8-bit components which is not correct, so
replace that with the actual size of the components in each
format.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before we were always coping from the buffer being mapped into the
temporary buffer. However, if INVALIDATE_RANGE is set, then we know that
the data is going to be junk after we unmap so there's no point in doing
the blit. This is important because doing the blit will cause a stall 3
lines later when we map the buffer.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Patch enables ES2 extension that utilizes existing ES3 functionality.
Changes make all the subtests to run and pass in WebGL conformance
test 'webgl-draw-buffers' when running Chrome on OpenGL ES, also
Piglit test 'draw_buffers_gles2' passes.
v2: remove unused boolean (Ilia Mirkin)
v3: proper error checking for invalid values (Chad Versace)
v4: run error check explicitly for ES2 and ES3 (Kenneth Graunke)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This will be needed for NIR because it is typeless and treats all constants
as uint32 values and reinterprets them when they are used later. This
commit allows those values to be properly propagated.
Also, this helps some synmark shaders because it allows us to copy
propagate a 0x00000000UD into a 0.0F in a load_payload, which then lets us
combine 4 load_payloads.
instructions in affected programs: 2288 -> 2144 (-6.29%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Removes commit 7894278 changes and moves fix to _mesa_GetInternalformativ().
The original commit enabled the GL_RGB and GL_RGBA unsized internal formats
as valid for render buffers in GLES3, but this is incorrect. They should
have only been enabled for GetInternalformativ()
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88079
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If, for example, only the x/y/w components of in.xyzw are actually used,
we still need to have a group of four registers and assign all four
components. The hardware can't write in.xy and in.w to discontiguous
registers. To handle this, pad with a dummy NOP instruction, to keep
the neighbor chain contiguous.
This fixes a problem noticed with firefox OMTC.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
According to the OpenGL and OpenGL ES specs (sections
"FRAMEBUFFER COMPLETENESS" and "Whole Framebuffer Completeness"),
the image for color, depth or stencil attachments must be renderable,
otherwise the attachment is considered incomplete and we should report
GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT. Currently, we detect this
situation properly but report a different error.
This fixes the following 3 piglit tests:
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgba_unsigned_int_2_10_10_10_rev
dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.rgb16f
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From GLES3 specification (page 123), "The currently bound sampler may be
queried by calling GetIntegerv with pname set to
SAMPLER_BINDINGGL_SAMPLER_BINDING".
Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getboolean
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getinteger64
* dEQP-GLES3.functional.state_query.integers.sampler_binding_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 2 dEQP tests:
dEQP-GLES3.functional.state_query.sampler.sampler_texture_min_lod_getsamplerparameteri
dEQP-GLES3.functional.state_query.sampler.sampler_texture_max_lod_getsamplerparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, a cast was done to convert from float to int but there
were rounding errors.
The spec specificies in Data Conversion chapter that Floating-point values are
rounded to the nearest integer.
This patch fixes the following 8 dEQP tests:
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_3d_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_2d_array_texture_max_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_min_lod_gettexparameteri
dEQP-GLES3.functional.state_query.texture.texture_cube_map_texture_max_lod_gettexparameteri
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Return the proper value for two-dimensional array texture and three-dimensional
textures.
From OpenGL ES 3.0 spec, chapter 6.1.13 "Framebuffer Object Queries",
page 234:
"If pname is FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER and the texture
object named FRAMEBUFFER_ATTACHMENT_OBJECT_NAME is a layer of a
three-dimensional texture or a two-dimensional array texture, then params
will contain the number of the texture layer which contains the attached im-
age. Otherwise params will contain the value zero."
Furthermore, FRAMEBUFFER_ATTACHMENT_TEXTURE_LAYER is an alias of
FRAMEBUFFER_ATTACHMENT_TEXTURE_3D_ZOFFSET_EXT.
This patch fixes dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_texture_layer
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Commit 0ae9ca12a8 put source modifiers out of the bitcast operations
by adding a MOV operation that would handle them separately. It missed
the case of ceil though: the implementation negates both its source and
destination operands. The source operand will be used for RNDD, which
we can handle normally, but we need to fix the modifier for the
negated result.
v2:
- RNDD can handle the source modifier so no need to put that one
in a separate MOV.
Fixes the following 42 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_vertex
dEQP-GLES3.functional.shaders.builtin_functions.common.ceil.*_fragment
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*vertex.*
dEQP-GLES3.functional.shaders.builtin_functions.precision.ceil._*fragment.*
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
"9.4. FRAMEBUFFER COMPLETENESS
...
Depth and stencil attachments, if present, are the same image."
Notice that this restriction is not included in the OpenGL ES2 spec.
Fixes 18 dEQP tests in:
dEQP-GLES3.functional.fbo.completeness.attachment_combinations.*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
'4.1.4 Stencil Test' section of the GL-ES 3.0 specification says:
"In the initial state, [...] the front and back stencil mask are both set
to the value 2^s − 1, where s is greater than or equal to the number of
bits in the deepest stencil buffer* supported by the GL implementation."
Since the maximum supported precision for stencil buffers is 8 bits, mask
values should be initialized to 2^8 - 1 = 0xFF.
Currently, these masks are initialized to max unsigned integer (~0u), because
in OpenGL 3.0 and before, the initial mask values were:
"In the initial state, stenciling is disabled, the front and back
stencil reference value are both zero, the front and back stencil
comparison functions are both ALWAYS, and the front and back
stencil mask are both all ones."
The problem is that it causes the mask values to overflow to -1 when converted
to signed integer by glGet* APIs.
Fixes 6 dEQP failing tests:
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_value_mask_separate_both_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_getfloat
* dEQP-GLES3.functional.state_query.integers.stencil_back_value_mask_separate_both_getfloat
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The range's min and max, and the precision value are not set correctly for the
vertex shader constants.
Fixes 1 dEQP test: dEQP-GLES3.functional.state_query.shader.precision_vertex_highp_int
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Check returned texObj is not null. If texObj is null there is already
GL_INVALID_OPERATION error set.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
GL_UNSIGNED_SHORT_5_5_5_1, GL_UNSIGNED_SHORT_1_5_5_5_REV,
GL_UNSIGNED_INT_10_10_10_2, GL_UNSIGNED_INT_2_10_10_10_REV data types
are not explicitly allowed to work with GL_ABGR_EXT format neither
in GL nor GL_EXT_abgr specs.
Removed the corresponding mesa formats as there are no other functions
using them inside Mesa anymore.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_pack_rgba_span_float was the last of the color span functions
and we have replaced all calls to it with calls to _mesa_format_convert,
so we can remove it together with tmp_pack.h which was used to
generate the pack functions for multiple types that were used from
the various color span functions that have been removed.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
This was only used to create temporary RGBA float images in the process
of storing some compressed formats. These can call _mesa_texstore
with a RGBA/float dst to achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that we have _mesa_format_convert we don't need this.
texstore_rgba will use the GL_COLOR_INDEX to RGBA conversion
helpers instead and compressed formats that used
_mesa_make_temp_ubyte_image to create an ubyte RGBA temporary
image can call _mesa_texstore with a RGBA/ubyte dst to
achieve the same goal.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_mesa_unpack_bitmap() was introduced by commit 02b801c to handle the case
when data is stored in PBO by display lists, in the context of this bug:
Incorrect pixels read back if draw bitmap texture through Display list
https://bugs.freedesktop.org/show_bug.cgi?id=10370
Since _mesa_unpack_image() already handles the case of GL_BITMAP, this patch
removes _mesa_unpack_bitmap() and makes affected calls go through
_mesa_unapck_image() instead.
The sample test attached to the original bug report passes with this change
and there are no piglit regressions.
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In the future we would like to have a format conversion library that is
independent of GL so we can share it with Gallium. This is a step in that
direction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of using _mesa_pack_rgba_span_float. This should allow us to remove
that function in a later patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is the only place that uses _mesa_unpack_color_span_float so after
this we should be able to remove that function.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Notice that _mesa_format_convert does not handle byte-swapping scenarios,
GL_COLOR_INDEX or MESA_FORMAT_YCBCR(_REV), so these must be handled
separately.
Also, remove all the code that goes unused after using _mesa_format_convert.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We only use _mesa_make_temp_ubyte_image in texstore.c to convert
GL_COLOR_INDEX to RGBA, but this helper does more stuff than this.
All uses of this helper can be replaced with calls to
_mesa_format_convert except for this GL_COLOR_INDEX conversion.
This patch extracts the GL_COLOR_INDEX to RGBA logic to a separate
helper so we can use that instead from texstore.c.
In future patches we will replace all remaining calls to
_mesa_make_temp_ubyte_image in the repository (related to compressed
formats) with calls to _mesa_format_convert so we can remove
_mesa_make_temp_ubyte_image and related functions.
v2:
- Remove ‘for’ loop initial declaration. They are only allowed in C99 or C11
mode.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For glReadPixels with a Luminance destination format we compute luminance
values from RGBA as L=R+G+B. This, however, requires ad-hoc implementation,
since pack/unpack functions or _mesa_swizzle_and_convert won't do this
(and thus, neither will _mesa_format_convert). This patch adds helpers
to do this computation so they can be used to support conversion to luminance
formats.
The current implementation of glReadPixels does this computation as part
of the span functions in pack.c (see _mesa_pack_rgba_span_float), that do
this together with other things like type conversion, etc. We do not want
to use these functions but use _mesa_format_convert instead (later patches
will remove the color span functions), so we need to extract this functionality
as helpers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We have _mesa_swap{2,4} but these do in-place byte-swapping only. The new
functions receive an extra parameter so we can swap bytes on a source
input array and store the results in a (possibly different) destination
array.
This is useful to implement byte-swapping in pixel uploads, since in this
case we need to swap bytes on the src data which is owned by the
application so we can't do an in-place byte swap.
v2:
- Include compiler.h in image.h, which is necessary to build in MSCV as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We had previously added the needed mesa formats, so we can simplify
the code further.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 after review by Jason Ekstrand:
- Move _mesa_format_from_format_and_type to glformats
- Return a mesa_format for GL_UNSIGNED_INT_8_8_8_8(_REV)
v3:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will come in handy when callers of _mesa_format_convert need
to compute the rebase swizzle parameter to use.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The new parameter allows callers to provide a rebase swizzle that
the function needs to use to match the requirements of the base
internal format involved. This is necessary when the source or
destination internal formats (depending on whether we are doing
the conversion for a pixel download or a pixel upload respectively)
do not match the base formats of the source or destination
formats of the conversion. This can happen when the driver does not
support the internal formats and uses a different format to store
pixel data internally.
For example, a texture upload from RGB to Luminance in a driver
that does not support textures with a Luminance format may decide
to store the Luminance data as RGBA. In this case we want to store
the RGBA values as (R,R,R,1). Following the same example, when we
download from that texture to RGBA we want to read (R,0,0,1). The
rebase_swizzle parameter allows these transforms to happen.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is necessary to handle conversions between array types where
the driver does not support the dst format requested by the client and
chooses a different format instead.
We will need this in _mesa_format_convert, so move it to format_utils.c,
prefix it with '_mesa_' and make it available to other files.
v2:
- Move _mesa_compute_component_mapping to glformats
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Iago Toral <itoral@igalia.com>:
- When testing if we can directly pack we should use the src format to check
if we are packing from an RGBA format. The original code used the dst format
for the ubyte case by mistake.
- Fixed incorrect number of bits for dst, it was computed using the src format
instead of the dst format.
- If the dst format is an array format, check if it is signed. We were only
checking this for the case where it was not an array format, but we need
to know this in both scenarios.
- Fixed incorrect swizzle transform for the cases where we convert between
array formats.
- Compute is_signed and bits only once and for the dst format. We were
computing these for the src format too but they were overwritten by the
dst values immediately after.
- Be more careful when selecting the integer path. Specifically, check that
both src and dst are integer types. Checking only one of them should suffice
since OpenGL does not allow conversions between normalized and integer types,
but putting extra care here makes sense and also makes the actual requirements
for this path more clear.
- The format argument for pack functions is the destination format we are
packing to, not the source format (which has to be RGBA).
- Expose RGBA8888_* to other files. These will come in handy when in need to
test if a given array format is RGBA or in need to pass RGBA formats to
mesa_format_convert.
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Add an RGBA8888_INT definition.
v4 by Iago Toral <itoral@igalia.com> after review by Jason Ekstrand:
- Added documentation for _mesa_format_convert.
- Added additional explanatory comments for integer conversions.
- Ensure that we use _messa_swizzle_and_convert for all signed source formats.
- Squashed: do not directly (un)pack to RGBA UINT if the source is not unsigned.
v5 by Iago Toral <itoral@igalia.com>:
- Adapted to the new implementation of mesa_array_format as a plain uint32_t
bitfield.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Use autogenerated format pack functions and take advantage of some
macros to reduce source code, facilitating its maintenance.
Unfortunately, dstType == GL_UNSIGNED_SHORT cannot simplified like
the others, so keep it as it is.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this in a later patch to refactor _mesa_pack_rgba_span_float.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Take advantage of new mesa formats and new format_pack functions to
reduce source code in _mesa_pack_rgba_span_from_ints() and
_mesa_pack_rgba_span_from_uints().
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This commit adds a macro to facilitate the task of using
format conversions functions but keeps the same API.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to refactor code in pack.c and support conversion
to/from these types in a master convert function that will be added
later.
v2:
- Fix autogeneration of MESA_FORMAT_A2R10G10B10_UNORM pack/unpack
functions
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be used to unify code in pack.c.
v2:
- Modify pack_int_*() function generator to use c.datatype() and
f.datatype()
v3:
- Only autogenerate pack_int_*() functions for non-normalized integer
formats.
v4:
- Use _mesa_unsigned_to_unsigned() in pack_int_*() because, in order
to be able to pack both signed and unsigned formats, we need to
sign-extend.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We will use this later on to handle uint conversion scenarios in a master
convert function.
v2:
- Modify pack_uint_*() function generation to use c.datatype() and
f.datatype().
- Remove UINT_TO_FLOAT() macro usage from pack_uint*()
- Remove "if not f.is_normalized()" conditional as pack_uint*()
functions are only autogenerated for non normalized formats.
v3:
- Add clamping for non-normalized integer formats in pack_uint*()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Add usage of INDENT_FLAGS in Makefile.am
v3 by Samuel Iglesias <siglesias@igalia.com>:
- Modify unpack_float_*() and unpack_ubyte_*() function generation
to use c.datatype() and f.datatype()
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>:
- format_unpack.c.mako is now format_unpack.py, with the template code
inlined. It now auto-generates format_unpack.c
- Add format_unpack.c to gitignore.
- Simplify Makefile.am change
- Modify SConscript to build format_unpack.c with scons
v5 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were auto-generating it before. The problem was that the autogeneration
tool we were using was called "copy, paste, and edit". Let's use a more
sensible solution.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 by Samuel Iglesias <siglesias@igalia.com>
- Remove format_pack.c as it is now autogenerated
- Add usage of INDENT_FLAGS in Makefile.am
- Remove trailing blank line
v3 by Samuel Iglesias <siglesias@igalia.com>
- Merge format_convert.py into format_parser.py
- Adapt pack_*_* function generations
- Fix out-of-tree build
v4 by Samuel Iglesias <siglesias@igalia.com>
- _get_datatype() is now a helper function
v5 by Samuel Iglesias <siglesias@igalia.com>
- format_pack.c.mako is now format_pack.py, with the template code
inlined. It now auto-generates format_pack.c
- Simplify Makefile.am change.
- Modify SConscript to build format_pack.c with scons.
- Remove run_mako.py
- Add format_pack.c to gitignore
v6 by Samuel Iglesias <siglesias@igalia.com>:
- Don't allow float to non-normalized integer format conversions.
- Add non-normalized formats support for ubyte packing functions. Merge
the previously separated patch.
- Add clamping for non-normalized integer formats in pack_ubyte*()
v7 by Samuel Iglesias <siglesias@igalia.com>:
- Add assert to check that sRGB formats are 8-bit size.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It is now a hard dependency because of the autogeneration of
format pack and unpack functions.
Update the documentation to reflect this change.
v2:
- Inline python script in m4 file and use PYTHON2
v3:
- Remove semicolons and quotes and change coding style
- Add Ilia Mirkin suggestion to use Python's split functionality.
- Use AX_CHECK_PYTHON_MAKO_MODULE name.
- Change to MIT license
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If we need the base format for a mesa_array_format we have to find the
matching mesa_format first. This is expensive because it requires
to loop through all existing mesa formats until we find the right match.
We can resolve the base format of an array format directly by looking
at its swizzle information. Also, we can have _mesa_get_format_base_format
accept an uint32_t which can pack either a mesa_format or a mesa_array_format
and resolve the base format for either type. This way clients do not need to
check if they have a mesa_format or a mesa_array_format and call different
functions depending on the case.
Another reason to resolve the base format for array formats directly is that
we don't have matching mesa_format enums for every possible array format, so
for some GL format/type combinations we can produce array formats that don't
have a corresponding mesa format, in which case we would not be able to
find the base format. Example format=GL_RGB, type=GL_UNSIGNED_SHORT. This type
would map to something like MESA_FORMAT_RGB_UNORM16, but we don't have that.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
An array format is a 32-bit integer format identifier that can represent
any format that can be represented as an array of standard GL datatypes.
Whie the MESA_FORMAT enums provide several of these, they don't account for
all of them.
v2 by Iago Toral Quiroga <itoral@igalia.com>:
- Implement mesa_array_format as a plain bitfiled uint32_t type instead of
using a struct inside a union to access the various components packed in
it. This is necessary to support bigendian properly, as pointed out by
Ian.
- Squashed: Make float types normalized
v3 by Iago Toral Quiroga <itoral@igalia.com>:
- Include compiler.h in formats.h, which is necessary to build in MSVC as
indicated by Brian Paul.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix various conversion paths that involved integer data types of different
sizes (uint16_t to uint8_t, int16_t to uint8_t, etc) that were not
being clamped properly.
Also, one of the paths was incorrectly assigning the value 12, instead of 1,
to the constant "one".
v2:
- Create auxiliary clamping functions and use them in all paths that
required clamp because of different source and destination sizes
and signed-unsigned conversions.
v3:
- Create MIN_INT macro and use it.
v4:
- Add _mesa_float_to_[un]signed() and mesa_half_to_[un]signed() auxiliary
functions.
- Add clamp for float-to-integer conversions in _mesa_swizzle_and_convert()
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
_BaseFormat is a GLenum (unsigned int) so testing if its value is
greater than 0 to detect the cases where _mesa_base_tex_format
returns -1 doesn't work.
Fixing the assertion breaks the arb_texture_view-lifetime-format
piglit test on nouveau, since that test calls
_mesa_base_tex_format with GL_R16F with a context that does not
have ARB_texture_float, so it returns -1 for the BaseFormat, which
was not being caught properly by the ASSERT in init_teximage_fields_ms
until now.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We were returning incorrect mesa formats for GL_LUMINANCE_ALPHA16I_EXT
and GL_LUMINANCE_ALPHA32I_EXT.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The PACK_565_REV macro is no longer used. It was also extremely confusing
because it's actually a byteswapped 565 not reversed 565.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aparently, the packing/unpacking functions for these formats have differed
from the format description in formats.h. Instead of fixing this, people
simply left a comment saying it was broken. Let's actually fix it for
real.
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Fix comment in formats.h
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch fixes the return of a wrong value when x is lower than
-MAX_INT(src_bits) as the result would not be between [-1.0 1.0].
v2 by Samuel Iglesias <siglesias@igalia.com>:
- Modify snorm_to_float() to avoid doing the division when
x == -MAX_INT(src_bits)
Cc: 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
No need to recheck the FS compile when the VS source has changed, but
there *is* a need to recheck the VS compile when the compiled VS has
changed (since the live inputs may change).
Fixes es3conform's blend test.
The util_pack_color() thing only sets up the low bits of the union, so
only return them, too. Fixes intermittent failure on
fbo-alphatest-formats and es3conform's framebuffer-objects test under
simulation.
Turns out this was harmful in code quality:
total instructions in shared programs: 39487 -> 38845 (-1.63%)
instructions in affected programs: 22522 -> 21880 (-2.85%)
This costs us yet another register, which is painful since it means more
programs might fail to compile). However, the alternative was causing us
trouble where we'd save/restore r3 while it contained a MIN-ed direct
texture offset, causing the kernel to fail to validate our shaders (such
as in GLB2.7).
This gets a bunch of dead reads out of the CSes, which don't read most
attributes generally.
total instructions in shared programs: 39753 -> 39487 (-0.67%)
instructions in affected programs: 4721 -> 4455 (-5.63%)
This will give the compiler the chance to dead-code eliminate unused VPM
reads. This is particularly a big deal in the CS where a bunch of vattrs
are just not going to be used.
I'm using this in some WIP commits for doing blending in 8888 instead of
vec4. But it also gives us these results immediately, thanks to allowing
more uniforms/immediates in the arguments:
total instructions in shared programs: 41027 -> 40960 (-0.16%)
instructions in affected programs: 4381 -> 4314 (-1.53%)
If you had a conditional assignment of an array or struct (say, from the
if-lowering pass), we'd try doing swizzle_for_size() on the aggregate
type, and it would assertion fail due to vector_elements==0. Instead,
extend emit_block_mov() to handle emitting the conditional operations,
which also means we'll have appropriate writemasks/swizzles on the CMPs
within a struct containing various-sized members.
Fixes 20 testcases in es3conform on vc4.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
TextureSubImage when target=GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This is part of a potential solution to a spec bug. Cube completeness
is a concept from glGenerateMipmap, but it seems reasonable to check for it in
GetTextureImage when the target is GL_TEXTURE_CUBE_MAP.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In implementing ARB_DIRECT_STATE_ACCESS functions, it is often necessary to
abstract the functionality of a traditional GL API function into a backend
that both the traditional and dsa API functions can share. For instance,
glTexParameteri and glTextureParameteri both call _mesa_texture_parameteri,
which takes a context object and a texture object as arguments.
The existance of such backend functions provides the opportunity for
driver internals (such as meta) to pass around the actual texture object
rather than its ID or target, saving on texture object storage and look-up
overhead.
This patch provides nameless texture creation and deletion for meta. This
will be used in an upcoming refactor of meta.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, certain error handling has
changed. One example shown here is that INVALID_ENUM is thrown instead of
INVALID_OPERATION when a user attempts to set sampler parameters for a
multisample target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Beginning in the OpenGL 4.3 core specification, some error handling has
changed (see OpenGL 4.5 core spec, 30.10.2014, Section 8.10 Texture
Parameters, pages 228-29). As an example, changing sampler states with a
multisample target throws INVALID_ENUM rather than INVALID_OPERATION.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The following preparations were made in texstate.c and texstate.h to
better facilitate the BindTextureUnit function:
Dylan Noblesmith:
mesa: add _mesa_get_tex_unit()
mesa: factor out _mesa_max_tex_unit()
This is about to appear in a lot more places, so
reduce boilerplate copy paste.
add _mesa_get_tex_unit_err() checking getter function
Reduce boilerplate across files.
Laura Ekstrand:
Made note of why BindTextureUnit should throw GL_INVALID_OPERATION if the unit is out of range.
Added assert(unit > 0) to _mesa_get_tex_unit.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This reflects the new naming convention for software fallbacks. To avoid
confusion with ARB_DIRECT_STATE_ACCESS backend functions, software fallbacks
now have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In order to implement ARB_DIRECT_STATE_ACCESS, many GL API functions must now
rely on a backend that both traditional and DSA functions can use. For
instance, _mesa_TexStorage2D and _mesa_TextureStorage2D both call a backend
function _mesa_texture_storage that takes a context and a texture object as
arguments. The backend is named _mesa_texture_storage so that Meta can call
it and avoid looking up the context and the texture object. However, backend
names often look very close to the names of software fallbacks (ie.
_mesa_alloc_texture_storage). For this reason, software fallbacks have been
renamed for clarity to have the form _mesa_[Driver function name]_sw.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Most ARB_DIRECT_STATE_ACCESS functions take an object's ID and use it to look
up the object in its hash table. If the user passes a fake object ID (ie. a
non-generated name), the implementation should throw INVALID_OPERATION.
This is a convenience function for texture objects.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We never used ulVersion for proper version checks.
Most 3rd party drivers use version 1, but recently NVIDIA OpenGL driver
started using a different version number, so the handy trick of renaming
Mesa's ICDs as nvoglv32.dll on Windows machines with NVIDIA hardware for
quick testing of Mesa software renderers stopped working.
Reviewed-by: Brian Paul <brianp@vmware.com>
SKL+ overloads the SIMD4x2 SIMD mode to mean either SIMD8D or SIMD4x2
depending on bit 22 in the message header. If the bit is 0 or there is
no header we get SIMD8D. We always wand SIMD4x2 in vec4 and for fs pull
constants, so use a message header in those cases and set bit 22 there.
Based on an initial patch from Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
We can't (or don't know how to) turn this off. But it can end up being
stored to a higher reg # than what the shader uses, leading to
corruption.
Also we currently aren't clever enough to turn off frag_coord/frag_face
if the input is dead-code, so just fixup max_reg/max_half_reg. Re-org
this a bit so both vp and fp reg footprint fixup are called by a common
fxn used also by ir3_cmdline. Also add a few more output lines for
ir3_cmdline to make it easier to see what is going on.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Handle TEMP[ADDR[]] src registers by generating a fanin to group array
elements, similarly to how texture fetch instructions work.
NOTE:
For all the scalar instructions generated for a single tgsi vector
operation which uses an array src (or possibly even uses the same array
as multiple srcs), re-use the same fanin node. Since a vector operation
operates on all components at the same time, it should never see more
than one version of the same array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
To use fanin's to group registers in an array, we can potentially have a
much larger array of registers. Rather than continuing to bump up the
array size, just make it dynamically allocated when the instruction is
created.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Group inputs/outputs, in addition to fanin/fanout, as they must also
exist in sequential scalar registers. This lets us simplify RA by
working in terms of neighbor groups.
NOTE: has the slight problem that it can't optimize out mov's for things
like:
MOV OUT[n], IN[m]
To avoid this, instead of trying to figure out what mov's we can
eliminate, we first remove all mov's prior to grouping, and then
re-insert mov's as needed while grouping inputs/outputs/fanins.
Eventually we'd prefer the frontend to not insert extra mov's in the
first place (so we don't have to bother removing them). This is the
plan for an eventual NIR based frontend, so separate out the instr
grouping (which will still be needed for NIR frontend) from the mov
elimination (which won't).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For temp arrays, a 32bit mask won't be sufficient.. but otoh we don't
need to support an arbitrary mask. So for this case use a simple size
field rather than a bitmask.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We probably could be more clever elsewhere and mask out components that
are not used. But either way, legalize should realize that there is
also a write-after-write hazard with texture sample instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Old compiler doesn't have ir3_block's.. so we need a special path. This
hack can be dropped when ir3_compiler_old is retired.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
NOTE IN[] and OUT[] don't need (have?) ArrayID's.. and TEMP[] can
optionally have them. So we implicitly assume that ArrayID==0 always
exists for each file. This is why array_max[file] is never less than
zero.
You can tell from indirect_files(_read/written) if the legacy array-
id zero was actually used.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least temporarily, I need to fallback to old compiler still for
relative dest (for freedreno), but I can do relative src temp. Only
a temporary situation, but seems easy/reasonable for tgsi-scan to
track this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We were invalidating si_screen:tm by calling
r600_destroy_common_screen() which frees the si_screen object. This
caused the driver to crash in LLVMDisposeTargetMachine() since we
were passing it an invalid pointer.
https://bugs.freedesktop.org/show_bug.cgi?id=88170
It doesn't work on Windows because of STDCALL calling convention -- it's
the callee responsibility to pop the arguments, and the number of
arguments vary with the prototype --, so the stack pointer ends up getting
corrupted.
This is just a non-invasive stop-gap fix. A proper fix would be more
elaborate, and require either:
- a variation of __glapi_noop_table which sets GL_INVALID_OPERATION
error
- stop using APIENTRY on all internal _mesa_* functions.
Tested with piglit gl-1.0-beginend-coverage (it now fails instead of
crashing).
VMware PR1350505
Reviewed-by: Brian Paul <brianp@vmware.com>
To catch mismatches in cdecl vs stdcall calling convention. See code
comment for more detailed explanation.
Tested with piglit gl-1.0-beginend-coverage (it now also crashes on
debug builds.)
VMware PR1350505.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a case where a transform feedback buffer is fed back as an index
buffer, because SURFACE_SYNC must be after VS_PARTIAL_FLUSH.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- we don't usually need to flush TC L2
- we should flush KCACHE
(not really an issue now since we always flush KCACHE when updating
descriptors, but it could be a problem if we used CE, which doesn't
require flushing KCACHE)
- add an explicit VS_PARTIAL_FLUSH flag
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
So that TC L2 doesn't need to be flushed.
The only problem is with index buffers, which don't use TC.
A simple solution is added that flushes TC L2 before a draw call (TC_L2_dirty).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's causing problems, because we mix uncached CP DMA with cached WRITE_DATA
when updating the same memory.
The solution for SI is to use uncached access here, because CP DMA doesn't
support cached access.
CIK will be handled in the next patch.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
That's either framebuffer caches or caches for shader resources.
The motivation is that framebuffer caches need to be flushed very rarely
here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It doesn't do anything useful. And colors are floating-point, so we can use
fs.interp, remove "flatshade" from the shader key, and rely on the FLAT_SHADE
state only (in the next patch).
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Only done for completeness. Not used by anything yet.
Tested by advertising PIPE_CAP_VERTEXID_NOBASE.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Ordered compares are what you have in C. Unordered compares are the result
of negating ordered compares (they return true if either argument is NaN).
That special NaN behavior is completely useless here, and unordered
compares produce horrible code with all stable LLVM versions.
(I think that has been fixed in LLVM git)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
- the relocs array is unused, remove it
- ndw is at most 115 (init), set 140 as the maximum
- compute needs 4 buffers per state, graphics only needs 1; set 4 as the maximum
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
From GL 4.4 Core profile:
If both PRIMITIVE_RESTART and PRIMITIVE_RESTART_FIXED_INDEX are
enabled, the index value determined by PRIMITIVE_RESTART_FIXED_INDEX is
used. If PRIMITIVE_RESTART_FIXED_INDEX is enabled, primitive restart is not
performed for array elements transferred by any drawing command not taking a
type parameter, including all of the *Draw* commands other than *DrawEle-
ments*.
Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Instead of telling the driver that the window system ancillaries have
been invalidated (when the driver doesn't know which of its buffers
are the window system's!), introduce a method for invalidating
specific surfaces.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is part of the EGL spec, and is useful for a tiled renderer to avoid
the memory bandwidth cost of storing the depth/stencil buffers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Merge the following upstream autoconf-archive patches.
ax_prog_flex: change grep syntax to accept e.g. "flex.real" in case a wrapper or symlink is used.
AX_PROG_FLEX: avoid use of grep empty string escape extension (fix for OpenBSD)
AX_PROG_FLEX: Also accept gflex.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jonathan Gray <jsg@openbsd.org>
Rather than building a new one every compile. This should reduce some
of the overhead of compiling shaders.
One consequence of this change is that we lose the MachineInstrs dumps
when dumping the shaders via R600_DEBUG. The LLVM IR and assembly is
still dumped, and if you still want to see the MachineInstr dump, you
can run the dumped LLVM IR through llc.
The "normal" detection (querying clflush size) already made sure it is
non-zero, however another method did not. This lead to crashes if this
value happened to be zero (apparently can happen in virtualized environments
at least).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this
does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function
decoration to fix the segfault which can happen if stack alignment is only
4 bytes.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
The headers hadn't been regenerated in a long time and had seen a number
of manual modifications. A few changes:
- remove nvc0_2d entirely, use the nv50 header which has the nvc0
values too
- remove 3ddefs, it's identical to the nv50 file
- move macros out into a separate file
Also the upstream rnndb changed the overall chip naming convention; this
was fixed up manually in the generated files until a better solution is
determined.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The headers hadn't been regenerated in a long time, and there were a few
minor divergences. Among other things, rnndb has changed naming to
G80/etc, for now I've not tackled switching that over and manually
replaced the nvidia codenames back to the chip ids. However no other
modifications of the headergen'd headers was done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compression seems to be supported for only some formats. Enable it for
those. Previously this was disabled for everything despite the code
looking like it was actually enabled.
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
assert is compiled out in release builds - don't put logic into it. Note
that this particular instance is only used for vp debugging and is
normally compiled out.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_swizzle_to_scs has been showing up in my CPU profiling, which is
rather silly - it's a tiny amount of code. It really should be inlined,
and can easily be implemented with fewer instructions.
The enum translation is as follows:
SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE
0 1 2 3 4 5
4 5 6 7 0 1
SCS_RED, SCS_GREEN, SCS_BLUE, SCS_ALPHA, SCS_ZERO, SCS_ONE
which is simply (swizzle + 4) & 7.
Haswell needs extra textureGather workarounds to remap GREEN to BLUE,
but Broadwell and later do not.
This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and
gen8_surface_state.c, since the Gen8+ code can be simplified to a mere
two instructions. Both copies can be marked static for easy inlining.
v2: Put the commit message in the code as comments (requested by
Jason Ekstrand). Also fix a typo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Valve games use GL_SRGB8 textures. Instead of supporting that properly,
we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which
meant that we had to use texture swizzling to override the alpha to 1.0
when sampling. This meant shader recompiles on Gen < 7.5 platforms.
By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0
for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles. All
generations of hardware have supported the format for sampling and
filtering; we can easily support rendering by using the R8G8B8A8_SRGB
format and writing garbage to the X channel. (We do this already for
the non-SRGB version of this format.)
This removes all remaining shader recompiles in a time demo of "Counter
Strike: Global Offensive" (32 -> 0) on Sandybridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format
for source surfaces, and brw->render_target_format[] for destination
surfaces. We should do the same in the sRGB MSAA overrides.
Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA.
The next commit will introduce RGBX SRGB MSAA buffers, at which point
we need to get the RGBX -> RGBA format overrides for rendering right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
ir_to_mesa does this - apparently we just forgot or something.
Without this, we'll guess the wrong texture swizzle (XYZW for color
instead of XXX1 for depth) when doing precompiles.
This cuts 26 shader recompiles in a time demo of "Counter Strike:
Global Offensive" (58 -> 32) on Sandybridge. Haswell still has 0
recompiles.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon). So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.
We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs. The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.
v2: Get the generation check right (caught by Chris Wilson),
combine the ++ with the check (suggested by Daniel Vetter).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are too many state flags to fit in one terminal screen, even with
a very tall terminal. Everything is flagged once, so a value of 1 means
that it hasn't ever happened again, and thus isn't terribly interesting.
Skipping those makes it easier to see the interesting values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.
Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a partial revert of c89306983c.
It split the {start,base}_vertex_location handling into several steps:
1. Set brw->draw.start_vertex_location = prim[i].start
and brw->draw.base_vertex_location = prim[i].basevertex.
(This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
appropriately. (This happened in brw_prepare_shader_draw_parameters,
which was called just after brw_prepare_vertices, as part of state
upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).
If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.
The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS. I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias. The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all. Hence the awkward split.
However, I believe that including brw->vb.start_vertex_bias was a
mistake. It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].
>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
parameter to the command that resulted in the current shader
invocation. In the case where the command has no <baseVertex>
parameter, the value of <gl_BaseVertexARB> is zero."
I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.
With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex. We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to. This is much simpler, and should actually fix two bugs.
Fixes missing geometry in Unvanquished.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Conditionalize it on having done any uploads (Turns out
u_upload_destroy() isn't safe with a NULL arg).
Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
arb_draw_indirect-vertexid elements
gl-3.2-basevertex-vertexid
Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs: 33140 -> 32745 (-1.19%)
From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:
total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs: 123444 -> 122110 (-1.08%)
with no programs hurt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs: 155923 -> 154984 (-0.60%)
Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
After CSEing some MOV ..., VF instructions we have code like
mov tmp, [1F, 2F, 3F, 4F]VF
mov r10, tmp
mov r11, tmp
...
use r10
use r11
We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Port of commit a28ad9d4 from the fs backend.
No shader-db changes since we don't emit MOV ..., VF instructions yet.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently only handles consecutive instructions with the same
destination that collectively write all channels.
total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs: 465236 -> 454449 (-2.32%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.
I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
As of 229bf4475f we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out. However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.
E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).
Our ability to perform register writes depends on the hardware and
kernel version. It shouldn't ever change on a per-context basis,
so we only need to check once.
Checking introduces a synchronization point between the CPU and GPU:
even though we submit very few GPU commands, the GPU might be busy doing
other work, which could cause us to stall for a while.
On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context
creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine
Valley running in the background (to keep the GPU busy), it improves
performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
* This is the cleaned up work of the Haiku GCI student
Adrián Arroyo Calle adrian.arroyocalle@gmail.com
* Several patches were consolidated to prevent
unnecessary touching of non-related code
This patch reduces the likelihood of pointer arithmetic overflow bugs in
gather_oa_results(), like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I get nervous when I see code patterns like this:
(void*) + (int) * (int)
I smell 32-bit overflow all over this code.
This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any
potential overflow.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch reduces the likelihood of pointer arithmetic overflow bugs in
intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac.
I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow
bug in a line of code that looks very similar to pointer arithmetic in
this function.
This patch conceptually applies the same fix as in b69c7c5dac. Instead
of retyping the variables, though, this patch adds some casts. (I tried
to retype the variables as ptrdiff_t, but it quickly got very messy. The
casts are cleaner).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch should diminish the likelihood of pointer arithmetic overflow
bugs, like the one fixed by b69c7c5dac.
Change the type of parameter 'out_stride' from int to ptrdiff_t. The
logic is that if you call intel_miptree_map() and use the value of
'out_stride', then you must be doing pointer arithmetic on 'out_ptr'.
Using ptrdiff_t instead of int should make a little bit harder to hit
overflow bugs.
As a side-effect, some function-scope variables needed to be retyped to
avoid compilation errors.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a pointer points to raw, untyped memory and is never dereferenced,
then declare it as 'void*' instead of casting it to 'void*'.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
trans_kill() only handles the single opcode. Drop the remnant of a time
when both KILL and KILL_IF were handled by the same fxn.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Standalone compiler doesn't have screen or context. We need to come up
with a better way to control the target arch (ie. something that we can
control from cmdline w/ standalone compiler) but for now this hack keeps
it from segfault'ing.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts. However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.
total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs: 10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs: 25551 -> 25924 (1.46%)
In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z. In this patch is I just use raddr conflict resolution,
which is more expensive. I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.
Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.
Improves glxgears framerate on RPi by around 30%.
This gets DRI3 working on modesetting with glamor. It's not enabled under
simulation, because it looks like handing our dumb-allocated buffers off
to the server doesn't actually work for the server's rendering.
This reverts db3dfcfe90.
The commit was correct but we've got some precision problems later in
llvmpipe (or possibly in draw clip) due to the vertices coming in in
different order, causing some internal test failures. So revert for now.
(Will only affect drivers which actually support constant-interpolated
attributes and not just flatshading.)
The blitter will start at a pixel's natural alignment. For PBOs, if the
provided offset if not aligned, bits will get dropped.
This change adds offset alignment check for src and dst, kicking back if
the requirements are not met.
The change is based on following verbiage from BSPEC:
Color pixel sizes supported are 8, 16, and 32 bits per pixel (bpp).
All pixels are naturally aligned.
Found in the following locations:
page 35 of intel-gfx-prm-osrc-hsw-blitter.pdf
page 29 of ivb_ihd_os_vol1_part4.pdf
page 29 of snb_ihd_os_vol1_part5.pdf
This behavior was observed with Steam Big Picture rendering incorrect
icon colors. The fix has been tested on Ubuntu and SteamOS on Haswell.
Signed-off-by: Cody Northrop <cody@lunarg.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83908
Reviewed-by: Neil Roberts <neil@linux.intel.com>
C linkage was removed from functions in program/sampler.cpp. However,
some cpp files include program/sampler.h within extern "C" blocks,
causing link errors for test_vec4_copy_propagation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Chris Wilson noted that repeated calls to CheckQuery() would call
drm_intel_bo_references(brw->batch.bo, query->bo) on each invocation,
which is expensive. Once we've flushed, we know that future batches
won't reference query->bo, so there's no point in asking more than once.
This patch adds a brw_query_object::flushed flag, which is a
conservative estimate of whether the batch has been flushed.
On the first call to CheckQuery() or WaitQuery(), we check if the
batch references query->bo. If not, it must have been flushed for
some reason (such as being full). We record that it was flushed.
If it does reference query->bo, we explicitly flush, and record that
we did so.
Any subsequent checks will simply see that query->flushed is set,
and skip the drm_intel_bo_references() call.
Inspired by a patch from Chris Wilson.
According to Eero, this does not affect the performance of Witcher 2
on Haswell, but approximately halves the userspace CPU usage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is less code and also measures the duration of the stall for us.
Our old code predates the existance of brw_bo_map().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
CheckQuery calls drm_intel_bo_references to see if the batch references
the query BO, and if so, flushes. It then checks if the query BO is
busy, and if not, calls gen6_queryobj_get_results().
Stupidly, gen6_queryobj_get_results() immediately did a second redundant
drm_intel_bo_references check, even though we know the buffer is not
referenced and in fact idle.
This patch moves the batch-flush check out of gen6_queryobj_get_results
and into WaitQuery() (the other caller). That way, both callers do a
single batch-flush check.
This should only be a minor improvement, since it would only affect
the first CheckQuery call where the result is actually available.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86969
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If query->bo == NULL, this is a redundant CheckQuery call, and we
should simply return. We didn't do anything anyway - we skipped the
batch flushing block, and although we called get_results(), it has an
early return and does nothing. Why bother?
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
q->Ready means that the results are in, and core Mesa is free to return
them to the application. gen6_queryobj_get_results() is a natural place
to set that flag; doing so means callers don't have to.
The older non-hardware-context aware code couldn't do this, because we
had to call brw_queryobj_get_results() to gather intermediate results
when we ran out of space for snapshots in the query buffer. We only
gather complete results in the Gen6+ code, however.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On the same go remove src/mapi/shared-glapi/tests/.gitignore
and src/mapi/glapi/tests/.gitignore as useless.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes 4 vertexid related piglit tests with llvmpipe due to switching
behavior of vertexid to the one gl expects.
(Won't fix non-llvm draw path since we don't get the basevertex currently.)
Plus a new PIPE_CAP_VERTEXID_NOBASE query. The idea is that drivers not
supporting vertex ids with base vertex offset applied (so, only support
d3d10-style vertex ids) will get such a d3d10-style vertex id instead -
with the caveat they'll also need to handle the basevertex system value
too (this follows what core mesa already does).
Additionally, this is also useful for other state trackers (for instance
llvmpipe / draw right now implement the d3d10 behavior on purpose, but
with different semantics it can just do both).
Doesn't do anything yet.
And fix up the docs wrt similar values.
v2: incorporate feedback from Brian and others, better names, better docs.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
r600, rv610 and rv630 all have a bug in their GPR indexing
and how the hw inserts access to PV.
If the base index for the src is the same as the dst gpr
in a previous group, then it will use PV instead of using
the indexed gpr correctly.
The workaround is to insert a NOP when you detect this.
v2: add second part of fix detecting DST rel writes followed
by same src base index reads.
v3: forget adding stuff to structs, just iterate over the
previous node group again, makes it more obvious.
v3.1: drop local_nop.
Fixes ~200 piglit regressions on rv635 since SB was introduced.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were assuming, when constructing a new brw_reg struct, that the
negate and abs register modifiers would not be present by default in
the new register.
Now, we force explicitly setting these values when constructing a new
register.
This will avoid problems like forgetting to properly set them when we
are using a previous register to generate this new register, as it was
happening in the dFdx and dFdy generation functions.
Fixes piglit test shaders/glsl-deriv-varyings
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82991
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the hash_table API required the user to do all of the hashing
of keys as it passed them in. Since the hashing function is intrinsically
tied to the comparison function, it makes sense for the hash table to know
about it. Also, it makes for a somewhat clumsy API as the user is
constantly calling hashing functions many of which have long names. This
is especially bad when the standard call looks something like
_mesa_hash_table_insert(ht, _mesa_pointer_hash(key), key, data);
In the above case, there is no reason why the hash table shouldn't do the
hashing for you. We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed. Also, if you do do your
own hashing, the hash table will assert that your hash matches what it
expects out of the hashing function. This should make it harder to mess up
your hashing.
v2: change to call the old entrypoint "pre_hashed" rather than
"with_hash", like cworth's equivalent change upstream (change by
anholt, acked-in-general by Jason).
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
glXSwapBuffersMscOML() with target_msc=divisor=remainder=0 gets
translated into target_msc=divisor=0 but remainder=1 by the mesa
api. This is done for server DRI2 where there needs to be a way
to tell the server-side DRI2ScheduleSwap implementation if a call
to glXSwapBuffers() or glXSwapBuffersMscOML(dpy,window,0,0,0) was
done. remainder = 1 was (ab)used as a flag to tell the server to
select proper semantic. The DRI3/Present backend ignored this
signalling, treated any target_msc=0 as glXSwapBuffers() request,
and called xcb_present_pixmap with invalid divisor=0, remainder=1
combo. The present extension responded kindly to this with a
BadValue error and dropped the request, but mesa's DRI3/Present
backend doesn't check for error codes. From there on stuff went
downhill quickly for the calling OpenGL client...
This patch fixes the problem.
v2: Change comments to be more clear, with reference to
relevant spec, as suggested by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Restores proper immediate tearing swap behaviour for
OpenGL bufferswap under DRI3/Present.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
v2: Add Frank Binns signed off by for his original earlier
patch from April 2014, which is identical to this one, and
Chris Wilsons reviewed tag from May 2014 for that patch, ergo
also for this one.
v3: Incorporate comment about triple buffering as suggested
by Axel Davy, and reference to relevant spec provided by
Eric Anholt.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevent calls to glXGetSyncValuesOML() and glXWaitForMscOML()
from overwriting the (ust,msc) values of the last successfull
swapbuffers call (PresentPixmapCompleteNotify event), as
glXWaitForSbcOML() relies on those values corresponding to
the most recent completed swap, not to whatever was last
returned from the server.
Problematic call sequence without this patch would have been, e.g.,
glXSwapBuffers()
... wait ...
swap completes -> PresentPixmapComplete event -> (ust,msc)
updated to reflect swap completion time and count.
... wait for at least 1 video refresh cycle/vblank increment.
glXGetSyncValuesOML()
-> PresentNotifyMsc event overwrites (ust,msc) of swap
completion with (ust,msc) of most recent vblank
glXWaitForSbcOML()
-> Returns sbc of last completed swap but (ust,msc) of last
completed vblank, not of last completed swap.
-> Client is confused.
Do this by tracking a separate set of (ust, msc) for the
dri3_wait_for_msc() call than for the dri3_wait_for_sbc()
call.
This makes the glXWaitForSbcOML() call robust again and restores
consistent behaviour with the DRI2 implementation.
Fixes applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Rename vblank_msc/ust to notify_msc/ust as suggested by
Axel Davy for better clarity.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
targetSBC == 0 is a special case, which asks the function
to block until all pending OpenGL bufferswap requests have
completed.
Currently the function just falls through for targetSBC == 0,
returning bogus results.
This breaks applications originally written and tested against
DRI2 which also rely on this not regressing under DRI3/Present,
e.g., Neuro-Science software like Psychtoolbox-3.
This patch fixes the problem.
v2: Simplify as suggested by Axel Davy. Add comments proposed
by Eric Anholt.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Eric Anholt <eric@anholt.net>
A bunch of open-coded 'gpu_id > 300's seems like it will eventually
cause problems with future generations. There were already a few minor
problems with caps for features that still need additional work on a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rather than duplicating this everywhere. Especially as on a4xx the
layout of layers and levels differs based on texture type.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This code is complete nonsense and has apparently existed since I first
implemented register spilling in the VS two years ago.
Scratch reads are SEND messages, which ignore the destination writemask.
The comment about "data that may not have been written to scratch" is
also confusing - we always spill whole 4x2 registers, so such data
simply does not exist. We can safely ignore the writemask.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This code has been turned off for the last
decade. Considering 3Dnow is obsolete it
seems the bug will never be fixed so just
remove it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
f0ba7d897d made debug_assert()/assert()
unsafe for expressions, but only now that u_atomic.h started to rely on
them for Windows that this became an issue.
This fixes non-debug builds with MSVC.
Reviewed-by: Brian Paul <brianp@vmware.com>
We support MOCS on both gen8 and gen9, so the message seems meaningless. Remove
it to avoid confusion.
Trivial.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The odds of having this patch make a difference on Gen8+ are probably very low.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-but-not-tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Because all topologies are reduced to basic primitives (i.e. no strips, fans)
and the vertices involved are all copied, there's no need for any elaborate
decisions where to insert the prim id. The logic employed was correct for
first provoking vertex, but didn't account at all for the last provoking
vertex case. And since we now will get the right constant value even if the
primitive type is later changed (for unfilled etc.) this is no longer
required to pass certain tests (which were checking for prim_id == some
const interpolated value so passing because both were wrong in the end).
This is a bit overkill (3x4 values assigned in total even though it's really
one scalar per prim...) but the code is now much easier and I don't need to
add more cases for last provoking vertex.
This fixes piglit primitive-id-no-gs-strip test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously the first provoking vertex convention would only be used if
flatshading were enabled. No matter how I look at it that cannot be possibly
correct. Maybe the code getting used was somewhat simpler that way at a time
where there weren't constant interpolated attributes, only flatshading...
(Note that all other places including the decomposition macros already do
the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This stage only worked for traditional old-school flatshading, it did ignore
constant interpolated values and only handled colors, the code probably
predates using of constant interpolated values in gallium. So fix this - the
clip stage apparently did this a long time ago already.
Unfortunately this also means the stage needs to be invoked when flatshading
isn't enabled but some other prim changing stages are - for instance with
fill mode line each of the 3 lines in a tri should get the same attribute
value from the leading vertex in the original tri if interpolation is constant,
which did not happen before
Due to that, the stage is now run in more cases, even unnecessary ones. Could
in theory skip it completely if there aren't any constant interpolated
attributes (and rast->flatshade isn't set), but not sure it's worth bothering,
as it looks kinda complicated getting this information in advance.
No piglit change (doesn't really cover this directly).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just like we do for tris (det shouldn't matter at this point, however
can have flags for things like line stipple reset).
No piglit change, it would fail line stippling tests if the flatshade
stage were run, which will happen with the next commit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The previous language was a bit misleading, since it sounded like
w was interpolated then the reciprocal calculated which isn't what
should be happening.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
With everything in place, we can now use the scalar backend compiler for
vertex shaders on BDW+. We make scalar vertex shaders the default on
BDW+ but add a new vec4vs debug option to force the vec4 backend.
No piglit regressions.
Performance impact is minimal, I see a ~1.5 improvement on the T-Rex
GLBenchmark case, but in general it's in the noise. Some of our
internal synthetic, vs bounded benchmarks show great improvement, 20%-40%
in some cases, but real-world cases are mostly unaffected.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that fs_visitor::run is back to being only fragment
shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT
conditions and rename it to run_fs.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch uses the previous refactoring to add a new run_vs() method
that generates vertex shader code using the scalar visitor and
optimizer.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These structs aren't vec4 specific, they are shared by shader stages
operating on Vertex URB Entries (VUEs). VUEs are the data structures in
the URB that hold vertex data between the pipeline geometry stages.
Using vue in the name instead of vec4 makes a lot more sense, especially
when we add scalar vertex shader support.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The scalar vertex shader will use the ATTR register file for vertex
attributes. This patch adds support for the ATTR file to fs_visitor.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This chunk of code is repeated in a few places, and we're going to add
a MESA_SHADER_VERTEX case to it soon.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This flag signals that we have a SIMD8 VS shader so we can set up the
corresponding state accordingly. This boils down to setting
the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull
constant buffers use dword pitch.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is all we need from the generator for SIMD8 vertex shaders. This
opcode is just the send instruction, all the hard work will happen
in the visitor using LOAD_PAYLOAD.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the caller passes in the shader debug name, we don't need this
anymore.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
fs_generator no longer knows what stage it's generating code for, so
we have to set the debug name of the shader from the call site.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This removes all stage specific data from the generator, and lets us
create a generator for any stage.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't propagate the saturate bit and some instructions can't
saturate at all. If the source has saturate set, just skip propagation.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Back to the original commit (8313f444) adding the workaround, we were
enabling it on gens <= 7, even though gens <= 5 can't do multisampling.
I cannot find documentation that says that Sandybridge needs this
workaround but in practice disabling it causes these piglit tests to
fail:
EXT_framebuffer_multisample/interpolation {2,4} centroid-deriv{,-disabled}
On Ironlake:
total instructions in shared programs: 4358478 -> 4349671 (-0.20%)
instructions in affected programs: 117680 -> 108873 (-7.48%)
A bunch of shaders in TF2, Portal 2, and L4D2 are cut by 25~30%.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This way we get a warning if an enum value is not handled.
v2: codestyle
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
On evergreen there are 4 regs, on r600/700 there is only one.
Don't initialise regs and trash someone elses state.
Not sure this fixes anything, but hey one less stupid.
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This means another pass of reordering the uniform data store, but it lets
us pair up a lot more instructions.
total instructions in shared programs: 44639 -> 43176 (-3.28%)
instructions in affected programs: 36938 -> 35475 (-3.96%)
This is a standard scheduling heuristic, and clearly helps.
total instructions in shared programs: 46418 -> 44467 (-4.20%)
instructions in affected programs: 42531 -> 40580 (-4.59%)
Collapse things back into a setup_slices() which takes the desired
alignment as a param. This gets things ready for a4xx which has some
slightly different requirements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits, but that is an assumption OpenGL drivers (or any dynamic library for that matter) can't afford to make as there are many closed- and open- source application binaries out there that only assume 4-byte stack alignment.
V4: fix comment and indentation
V3: move all sse4.1 build flag config to the same location
and add comment as to why we need to do the realign
V2: use $target_cpu rather than $host_cpu
and setup build flags in config rather than makefile
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Matt Turner <mattst88@gmail.com>
CC: "10.4" <mesa-stable@lists.freedesktop.org>
For OpenGL ES 3.0 spec, the minor number for SHADING_LANGUAGE_VERSION is always
two digits, matching the OpenGL ES Shading Language Specification release
number. For example, this query might return the string "3.00".
This patch fixes the following dEQP test:
dEQP-GLES3.functional.state_query.string.shading_language_version
No piglit regression observed.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GLSL ES 3.00 spec, chapter 4.6.1 "The Invariant Qualifier",
Only variables output from a shader can be candidates for invariance. This
includes user-defined output variables and the built-in output variables.
As only outputs can be declared as invariant, an invariant output from one
shader stage will still match an input of a subsequent stage without the
input being declared as invariant.
This patch fixes the following dEQP tests:
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_interp_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage_precision
dEQP-GLES3.functional.shaders.qualification_order.variables.valid.invariant_storage
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_interp_storage_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_precision_invariant_input
dEQP-GLES3.functional.shaders.qualification_order.variables.invalid.invariant_storage_invariant_input
No piglit regressions observed.
v2:
- Add spec content in the code
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The current code computes ctx->Array.LegalTypesMask just once,
however, computing this needs to consider ctx->API so we need
to make sure that the API for that context has not changed if
we intend to reuse the result.
The context API can change, at least, if we go through
_mesa_meta_begin, since that will always force
API_OPENGL_COMPAT until we call _mesa_meta_end. If any
operation in between these two calls triggers a call to
update_array_format, then we might be caching a value for
LegalTypesMask that will not be right once we have called
_mesa_meta_end and restored the context API.
Fixes the following 179 dEQP tests in i965:
dEQP-GLES3.functional.vertex_arrays.single_attribute.strides.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.fixed.*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_draw.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_copy.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.static_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.stream_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.single_attribute.usages.dynamic_read.*fixed*
dEQP-GLES3.functional.vertex_arrays.multiple_attributes.input_types.3_*fixed2*
dEQP-GLES3.functional.draw.random.{2,18,28,68,83,106,109,156,181,191}
Reviewed-by: Brian Paul <brianp@vmware.com>
From GL ES 3.0 specification, section 6.1.15 Internal Format Queries (page 236),
multisampling is not supported for signed and unsigned integer internal formats.
Fixes 19 dEQP tests under 'dEQP-GLES3.functional.state_query.internal_format.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_RGB and GL_RGBA are valid internal formats on a GLES3 profile. See
"Table 1. Unsized Internal Formats" at
https://www.khronos.org/opengles/sdk/docs/man3/html/glTexImage2D.xhtml.
Fixes 2 dEQP tests:
- dEQP-GLES3.functional.state_query.internal_format.rgb_samples
- dEQP-GLES3.functional.state_query.internal_format.rgba_samples
Reviewed-by: Brian Paul <brianp@vmware.com>
In OpenGL and OpenGL-ES 3+, GL_DEPTH_STENCIL_ATTACHMENT is a valid attachment point for the family of functions
that invalidate a framebuffer object (e.g, glInvalidateFramebuffer, glInvalidateSubFramebuffer, etc).
Currently, a GL_INVALID_ENUM error is emitted for this attachment point.
Fixes 21 dEQP test failures under 'dEQP-GLES3.functional.fbo.invalidate.*'.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This increases the cost of a raddr b conflict spill (save r3 to rb31, move
src1 to r3, move rb31 back to r3 when done, instead of just move src1 to
r3), but on average thanks to instruction pairing it's more worthwhile to
have another accumulator.
total instructions in shared programs: 46428 -> 46171 (-0.55%)
instructions in affected programs: 38030 -> 37773 (-0.68%)
The register allocator walks from the end of the nodes array looking for
trivially-allocatable things to put on the stack, meaning (assuming
everything is trivially colorable and gets put on the stack in a single
pass) the low node numbers get allocated first. The things allocated
first happen to get the lower-numbered registers, which is to say the fast
accumulators that can be paired more easily.
When we previously made the nodes match the temporary register numbers,
we'd end up putting the shader inputs (VS or FS) in the accumulators,
which are often long-lived values. By prioritizing the shortest-lived
values for allocation, we can get a lot more instructions that involve
accumulators, and thus fewer conflicts for raddr and WS.
total instructions in shared programs: 52870 -> 46428 (-12.18%)
instructions in affected programs: 52260 -> 45818 (-12.33%)
Since d8da6decea where the
state tracker started using UCMP on cayman a number of tests
regressed.
this seems to be r600g is doing CNDGE_INT for UCMP which is >= 0,
we should be doing CNDE_INT with reverse arguments.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reduces .text size of mesa_dri_drivers.so (i965-only) by 62k, or 1.4%.
Note that we don't remove inline from lerp_2d(), which has a comment
above it saying it definitely should be inlined. Though, removing the
inline keyword from it doesn't actually change the compiled code for me.
Reviewed-by: Brian Paul <brianp@vmware.com>
The docs say that we shouldn't need this workaround for gen8+, but just
removing it, causes gpu hangs. We'll revisit this, but for now, just
extend the workaround to gen9.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
SKL moves the GS threadcount to dw8 from dw7, and no longer does the
divide by 2 thing.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
This patch fixes this build error with G++ <= 4.6.
CXX test_vf_float_conversions.o
test_vf_float_conversions.cpp: In function ‘unsigned int f2u(float)’:
test_vf_float_conversions.cpp:63:20: error: expected primary-expression before ‘.’ token
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86939
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The register allocator prefers low-index registers from vc4_regs[] in the
configuration we're using, which is good because it means we prioritize
allocating the accumulators (which are faster). On the other hand, it was
causing raddr conflicts because everything beyond r0-r2 ended up in
regfile A until you got massive register pressure. By interleaving, we
end up getting more instruction pairing from getting non-conflicting
raddrs and QPU_WSes.
total instructions in shared programs: 55957 -> 52719 (-5.79%)
instructions in affected programs: 46855 -> 43617 (-6.91%)
We can avoid it by carefully ordering the packing. This is important as a
step in giving r3 to the register allocator.
total instructions in shared programs: 56087 -> 55957 (-0.23%)
instructions in affected programs: 18368 -> 18238 (-0.71%)
All uses of this require that the value be at least one, so it's
easier to report at least one than having to wrap all uses
in MAX2(max_compute_units, 1).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Harvested GPUs have some of their render backends disabled, so
in order to prevent the hardware from trying to render things
with these disabled backends we need to correctly program
the PA_SC_RASTER_CONFIG register.
v2:
- Write RASTER_CONFIG for all SEs.
v3:
- Set GRBM_GFX_INDEX.INSTANCE_BROADCAST_WRITES bit.
- Set GRBM_GFX_INFEX.SH_BROADCAST_WRITES bit when done setting
PA_SC_RASTER_CONFIG.
- Get num_se and num_sh_per_se from kernel.
v4:
- Get correct value for num_se
- Remove loop for setting PA_SC_RASTER_CONFIG
- Only compute raster config when a backend has been disabled.
v5: Michel Dänzer
- Fix computation for chips with multiple SEs
https://bugs.freedesktop.org/show_bug.cgi?id=60879
CC: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
There is a bug in the current lowering pass implementation where we lower saturate
to clamp only for vertex shaders on drivers supporting SM 3.0. The correct behavior
is to actually lower to clamp only when we don't support saturate which happens
on drivers that don't support SM 3.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Fixes an infinite loop in swrast where the lowering pass unpacks saturate into
clamp but the opt_algebraic pass tries to do the opposite.
v3 (Ian):
This is a revert of commit cfa8c1cb "ir_to_mesa: lower ir_unop_saturate" on
the ir_to_mesa.cpp portion. prog_execute.c can handle saturates in vertex
shaders, so classic swrast shouldn't need this lowering pass.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83463
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
There were previously regressions regarding border colors, which the
updated swizzle logic resolves.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This is a hack since it uses the texture information together with the
sampler, but I don't see a better way to do it. In OpenGL, there is a
1:1 correspondence.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
This was an oversight in the original patch. When PolygonMode is
used, then front faces, back faces, or both may be rendered as
points and are affected by point sprite state.
Note that SNB/IVB can't actually be fully conformant here, for
a legacy context -- we don't have separate sets of pointsprite
enables for front and back faces. Haswell ignores pointsprite
state correctly in hardware for non-point rasterization, so can
do this correctly, but it doesn't seem worth it.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86764
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dead code elimination was eating the Y offset.
Fixes the piglit test:
spec/ARB_gpu_shader5/arb_gpu_shader5-interpolateAtOffset-nonconst
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The original idea was to optimize away the condition by integrating it directly
into the CMP instruction. However, with native integers this requires an extra
I2F instruction. It is also fishy because the negation used didn't really honor
ieee754 float comparison rules, not to mention the CMP instruction itself
(being pretty much a legacy instruction) doesn't really have defined special
float value behavior in any case.
So, use UCMP and adjust the code trying to optimize the condition away
accordingly (I have absolutely no idea if such conditions are actually hit
or would be translated away somewhere else already).
v2: cosmetic changes
No piglit regressions on llvmpipe.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Multiple scenes per context are meant to be used so a new scene can be built
while another one is processed in rasterization. However, quite surprisingly,
this does not actually work (and according to git log, possibly never did,
though maybe it did at some point further back (5 years+) but was buggy)
because we always wait immediately on the rasterizer to finish the scene when
contexts (and hence setup/scene) is flushed. This means when we try to get
an empty scene later, any old one is already empty again.
Thus using multiple scenes is just a waste of memory (not too bad, since the
additional scenes are guaranteed to be empty, which means their size ought to
be one data block (64kB) plus the size of some structs), without actually
really doing anything. (There is also quite some code for the whole concept of
multiple scenes which doesn't really do much in practice, but keep it hoping
the wait-on-scene-flush can be fixed some day.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The prim assembler may change the prim type when injecting prim ids now,
which isn't reflected by what's stored in emit.
This looks brittle and potentially dangerous (it is not obvious if such prim
type changes are really supported by pt emit, the prim type is actually also
set in prepare which would then be different).
This fixes piglit primitive-id-no-gs-first-vertex.shader_test.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The decomposition done in the prim assembler will turn tri fans into tris,
but this wasn't reflected in the output prim type. Meaning with a tri fan
with 6 verts input, the output was a tri fan with 12 vertices instead of a
tri list with 12 vertices (not as bad as it sounds, since the additional tris
created would all be degenerate since they'd all have two times vertex zero
but still bogus).
This is because the prim assembler is used if either the input topology is
something with adjacency, or if prim id needs to be injected, and for the
latter case topologies without adjacency can be converted to basic ones.
Unfortunately decomposition here for inserting prim ids is necessary, at
least for the indexed case where we can't just insert the prim id at the
right place depending on provoking vertex.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The default macros when the adjacency macros aren't defined will already
exactly do that (that is, drop the adjacent vertices and call the non-adjacent
macro).
Reviewed-by: Jose Fonseca <jfonseca@vmwarec.com>
Safe from causing optimization loops, since we don't constant propagate
VF arguments.
(for this and the previous patch):
total instructions in shared programs: 4289075 -> 4271932 (-0.40%)
instructions in affected programs: 1616779 -> 1599636 (-1.06%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The LINE instruction performs a multiply-add instruction (a * b + c)
where b and c are scalar arguments. It reads b and c from offsets in
src0 such that you can load them (it they're representable) as a
vector-float immediate with a single instruction.
Hurts some programs, but that'll all get better once we CSE the
vector-float MOVs in the next patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The PRMs say that
<src0> region must be a replicated scalar
(with HorzStride = VertStride = 0).
but apparently that doesn't actually apply to all generations. I did
notice when implementing the optimization later in this series that G45
and ILK needed this regioning.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GS has an interesting use for mul. Because the GS can emit multiple
vertices per input vertex, and it also has a unique count at the top of the URB
payload, the GS unit needs to be able to dynamically specify URB write offsets
(relative to the global offset). The documentation in the function has a very
good explanation from Paul on the mechanics.
This fixes around 2000 piglit tests on BSW.
v2:
Reworded commit message (Ben) no mention of CHV (Matt)
Change SHRT_MAX to USHRT_MAX (Ken, and Matt)
Update comment in code to reflect the use of UW (Ben)
Add Gen7+ assertion for the relevant GS code, since it won't work on Gen6- (Ken)
Drop the bogus hunk in emit_control_data_bits() (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84777 (with many dupes)
Cc: "10.4 10.3 10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If an operation is the last one to read a register, the instruction
containing it can also include the op that has the next write to that
register.
total instructions in shared programs: 57486 -> 56995 (-0.85%)
instructions in affected programs: 43004 -> 42513 (-1.14%)
We were scheduling TLB operations as early as possible, and texture setup
as late as possible. When I introduced prioritization, I visually
inspected that an independent operation got moved above texture results
collection, which tricked me into thinking it was working (but it was just
because texture setup was being pushed late).
total instructions in shared programs: 57651 -> 57486 (-0.29%)
instructions in affected programs: 18532 -> 18367 (-0.89%)
Jason realized that we could fix the result of the CMP instruction on
Gen <= 5 by doing -(result & 1). Also do the resolves in the vec4
backend before use, rather than when the bool was created. The FS does
this and it saves some unnecessary resolves.
On Ironlake:
total instructions in shared programs: 4289762 -> 4287277 (-0.06%)
instructions in affected programs: 619430 -> 616945 (-0.40%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is a revert of commit 4656c14e ("i965/fs: Change the type of
booleans to UD and emit correct immediates") plus some small additional
fixes, like casting ctx->Const.UniformBooleanTrue to int and changing UD
to D in the ir_unop_b2f cases. Note that it's safe to leave 0x3f800000
as UD and as a literal it's more recognizable than 1065353216.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Three source instructions cannot directly source a packed vec4 (<0,4,1>
regioning) like vec4 uniforms, so we emit a MOV that expands the vec4 to
both halves of a register.
If these uniform values are used by multiple three-source instructions,
we'll emit multiple expansion moves, which we cannot combine in CSE
(because CSE emits moves itself).
So emit a virtual instruction that we can CSE.
Sometimes we demote a uniform to to a pull constant after emitting an
expansion move for it. In that case, recognize in opt_algebraic that if
the .file of the new instruction is GRF then it's just a real move that
we can copy propagate and such.
total instructions in shared programs: 5822418 -> 5812335 (-0.17%)
instructions in affected programs: 351841 -> 341758 (-2.87%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts an instruction from two shaders in Tesseract, by allowing the
(x+y) cmp 0 -> x cmp -y optimization to take place.
instructions in affected programs: 1198 -> 1194 (-0.33%)
Reviewed-by: Eric Anholt <eric@anholt.net>
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits,
but that is an assumption OpenGL drivers (or any dynamic library for
that matter) can't afford to make as there are many closed- and open-
source application binaries out there that only assume 4-byte stack
alignment.
This fix uses force_align_arg_pointer GCC attribute, and is only a
stop-gap measure.
The right fix would be to pass -mstackrealign or
-mincoming-stack-boundary=2 to all source fails that use any -msse*
option, as there is no way to guarantee if/when GCC will decide to spill
SSE registers to the stack.
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Reviewed-by: Brian Paul <brianp@vmware.com>
BRW_NEW_VERTICES is flagged every time we draw a primitive. Having
the brw_vs_prog atom depend on BRW_NEW_VERTICES meant that we had to
compute the VS program key and do a program cache lookup for every
single primitive. This is painfully expensive.
The workaround bit computation is almost entirely based on the vertex
attribute arrays (brw->vb.inputs[i]), which are set by brw_merge_inputs.
The only thing it uses the VS program for is to see which VS inputs are
actually read. brw_merge_inputs() happens once per primitive, and can
safely look at the currently bound vertex program, as it doesn't change
in the middle of a draw.
This patch moves the workaround bit computation to brw_merge_inputs(),
right after assigning brw->vb.inputs[i], and stores the previous WA bit
values in the context. If they've actually changed from the last draw
(which is uncommon), we signal that we need a new vertex program,
causing brw_vs_prog to compute a new key.
Improves performance in Gl32Batch7 by 13.6123% +/- 0.739652% (n=166)
on Haswell GT3e. I'm told Baytrail shows similar gains.
v2: Introduce a new BRW_NEW_VS_ATTRIB_WORKAROUNDS dirty bit, rather
than reusing BRW_NEW_VERTEX_PROGRAM (suggested by Chris Forbes).
This prevents unnecessary re-emission of surface/sampler related
atoms (and an SOL atom on Sandybridge).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
If you hit this, you didn't compile with --with-egl-platforms=...
Recompile with something like --with-egl-platforms=x11,drm and make
clean and make again.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These stopped being necessary in commit ab973403e4.
v2: Update commit message with a better explanation (thanks to Eric
Anholt for doing the git archaeology).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We don't access brw->vertex_program or ctx->_Shader since the previous
commit, so we don't need this dirty bit.
I think it's still necessary on Gen6 because it still conflates
constant uploading with unit state uploading. We can fix that later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We use IEEE mode for GLSL programs, but need to use ALT mode for ARB
programs so that 0^0 == 1. The choice is based entirely on the shader
source language.
Previously, our code to determine which mode we wanted was duplicated
in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8).
The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing
as well - we use CurrentProgram (non-derived state), but _Shader
(derived state). It also relies on knowing that ARB programs don't
use gl_shader_program structures today. The compiler already makes
this assumption in a few places, but I'd rather keep that assumption
out of the state upload code.
With this patch, we select the mode at compile time, and store that
choice in prog_data. The state upload code simply uses that decision.
This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit c0347705 changed the Gen6-7 code to use ctx->_Shader rather than
ctx->Shader, but neglected to change the Gen4-5 or Gen8+ code.
This might fix SSO related bugs, but ALT mode is only used for ARB
programs, so if there's an actual problem, it's likely no one would
run into it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The "Pixel Shader Computed Depth Mode" value is entirely based on the
shader program, so we can easily do it at compile time. This avoids the
if+switch on every 3DSTATE_WM (Gen7)/3DSTATE_PS_EXTRA (Gen8+) upload,
and shares a bit more code.
This also simplifies the PMA stall code, making it match the formula
more closely, and drops a BRW_NEW_FRAGMENT_PROGRAM dependency. (Note
that the previous comment was wrong - the code and the documentation
have != PSCDEPTH_OFF, not ==.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We shouldn't receive variables with invalid locations set - adding these
assertions should help catch problems before they cause crashes later.
Inspired by similar code in st_glsl_to_tgsi.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Half gives you the second half of a SIMD16 register, but if the register
is a uniform it would incorrectly give you the next register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Nine code to match vertex declaration to vs inputs was limiting
the number of possible combinations.
Some sm3 games have issues with that, because arbitrary (usage/index)
can be used.
This patch does the following changes to fix the problem:
. Change the numbers given to (usage/index) combinations to uint16
. Do not put limits on the indices when it doesn't make sense
. change the conversion rule (usage/index) -> number to fit all combinations
. Instead of having a table usage_map mapping a (usage/index) number to
an input index, usage_map maps input indices to their (usage/index)
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
With sm3, you can declare an input/output with an usage and an usage index.
Nine code hardcodes the translation usage/index to a corresponding TGSI code.
The translation was limited to a few usage/index combinations that were corresponding
to most of the needs of games, but some games did not work.
This patch rewrites that Nine code to map all possible usage/index combination
to TGSI code. The index associated to TGSI_SEMANTIC_GENERIC doesn't need to be low
for good performance, as the old code was supposing, and is not particularly bounded
(it's UINT16). Given the index is BYTE, we can map all combinations.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Nine was allowing that behaviour, but was not filling the result.
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Issuing D3DISSUE_END should:
. reset previous queries if possible
. end the query
Previous behaviour wasn't calling end_query for
queries not needing D3DISSUE_BEGIN, nor resetting
previous queries.
This fixes several applications not launching properly.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Some queries need the driver to advertise a cap to be supported.
For example r300 doesn't support them.
v2 (David): check also for PIPE_CAP_QUERY_PIPELINE_STATISTICS, fix wine
tests on r300g
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
get_query_result flushes automatically, we don't need to flush.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Applications are supposed to call CreateQuery with a NULL
ppQuery to know if the query is supported. We supported that.
However when ppQuery was not NULL, we were accepting to create the
query and were creating a dummy query even when the query is not
supported.
Wine has different behaviour. This patch drops the dummy queries
support and matches wine behaviour.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Note that some of the GLSL specifications explicitly state this as
compile error, some simply state that 'it is an error'.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The state upload code was incorrectly shifting the attribute swizzles. The
effect of this is we're likely to get the default swizzle values, which disables
the component.
This doesn't technically fix any bugs since Skylake support is still disabled by
default (no PCI IDs).
While here, since VARYING_SLOT_MAX can be greater than the number of attributes
we have available, add a warning to the code to make sure we never do the wrong
thing (and hopefully prevent further static analysis from finding this).
Admittedly I am a bit confused. It seems to me like the moment a user has
greater than 8 varyings we will hit this condition. CC Ken to clarify.
v2: Forgot to git add the warning message in v1
v3: Change the > 31 varyings to an assertion (Ken)
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu> (via Coverity)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Vertex color clamping is only relevant if the shader writes to
the built-in gl_[Secondary]{Front,Back}Color varyings. Otherwise,
brw_vs_prog_key::clamp_vertex_color is never used, so we can simply
leave it set to 0.
This enables us to correctly predict the clamp_vertex_color key value
in the precompile for shaders which don't use those varyings.
Eliminates virtually all VS recompiles in Serious Sam 3's intro.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to gl_[Secondary]{Front,Back}Color,
which are compatibility-only built-in varyings. We only support GS in
core profile, so they can't exist in geometry shaders.
We can drop several dirty bits from the GS program key - they're
unnecessary for a core profile implementation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Vertex color clamping only applies to a few specific built-ins: COL0/1
and BFC0/1 (aka gl_[Secondary]{Front,Back}Color). It seems weird to
handle special cases in a function called emit_generic_urb_slot().
emit_urb_slot() is all about handling special cases, so it makes more
sense to handle this there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With fs_visitor/fs_generator being reused for SIMD8 VS/GS programs,
we're running into weird #include patterns, where scalar code #includes
brw_vec4.h and such.
Program keys aren't really related to SIMD4X2/SIMD8 execution - they
mostly capture NOS for a particular shader stage. Consolidating them
all in one place that's vec4/scalar neutral should help avoid problems.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
It's been merged into brw_state_flags::brw for simplicity and
efficiency.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I put the BRW_NEW_*_PROG_DATA flags at the beginning so that
brw_state_cache.c can still continue using 1 << brw_cache_id.
I also added a comment explaining the difference between
BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time
to remember it.
Non-mechanical changes:
- brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache.
- brw_state_upload.c - INTEL_DEBUG=state changes.
- brw_context.h - bit definition merging.
v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention
state-based recompiles, and nix the "proper subset" claim,
as it's false. (Caught by Kristian Høgsberg).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only
ones that are left are legitimately related to the program cache. Yet,
it seems a bit wasteful to have an entire bitfield for only 7 bits.
State upload is one of the hottest paths in the driver. For each atom
in the list, we call check_state() to see if it needs to be emitted.
Currently, this involves comparing three separate bitfields (mesa, brw,
and cache). Consolidating the brw and cache bitfields would save a
small amount of CPU overhead per atom. Broadwell, for example, has
57 state atoms, so this small savings can add up.
CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the
offset into the program cache BO (prog_offset). Since most uses refer
to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name.
Removing "cache" completely is a bit painful, so I decided to do it in
several patches for easier review, and to separate mechanical changes
from manual ones. This one simply renames things, and was made via:
$ for file in *.[ch]; do
sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \
-e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file
done
Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw!
The next patch will remedy this flaw. It will also fix the
alphabetization issues.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Matt Turner <mattst88@gmail.com>
This was added in September 2013 when we first implemented the fast
(but lower quality) derivatives. A quick Google search didn't turn
up anyone using or recommending the option, so I suspect no one does.
Applications that want to control the quality of their derivatives can
use the new GL_ARB_derivative_control extension, or use the glHint
mechanism. The driconf option seems superfluous.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Note from Ken:
"We used to use the return value to indicate whether software
fallbacks were necessary, but we haven't in years."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the code in _mesa_validate_DrawElements,
_mesa_validate_DrawRangeElements, and
_mesa_validate_DrawElementsInstanced was the same. Refactor this out to
common code.
As a side-effect, a bug in _mesa_validate_DrawElementsInstanced was
fixed. Previously this function would not generate an error when
check_valid_to_render failed if numInstances was 0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3-ish versions of the spec are less clear that an error should be
generated here, so Ken (and I during review) just missed it in 1afe335.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
height=0 is legal for 1D array textures (as depth=0 is legal for
2D arrays). Fixes new piglit ext_texture_array-errors test.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We've got two mostly-independent operations in each QPU instruction, so
try to pack two operations together. This is fairly naive (doesn't track
read and write separately in instructions, doesn't convert ADD-based MOVs
into MUL-based movs, doesn't reorder across uniform loads), but does show
a decent improvement on shader-db-2.
total instructions in shared programs: 59583 -> 57651 (-3.24%)
instructions in affected programs: 47361 -> 45429 (-4.08%)
Since 73dd50acf6
glsl: implement switch flow control using a loop
The SB backend was falling over in an assert or crashing.
Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-By: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves 359 shaders by >=10%
114 shaders by >=20%
91 shaders by >=30%
82 shaders by >=40%
22 shaders by >=50%
4 shaders by >=60%
2 shaders by >=80%
total instructions in shared programs: 5845346 -> 5822422 (-0.39%)
instructions in affected programs: 364979 -> 342055 (-6.28%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most prominently helps Natural Selection 2, which has a surprising
number shaders that do very complicated things before drawing black.
instructions in affected programs: 21052 -> 16978 (-19.35%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Pre-Haswell hardware couldn't actually predicate it, but it's easier to
pretend as if it's predicated in the visitor since it will generate a
MOV from f0.1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anonymous structures are only supported with newer versions of
GCC. They will not work with GCC 4.2.1 used by OpenBSD or
GCC 4.4.7 shipped with RHEL6 going by a commit to fix a similiar
problem in radeonsi earlier in the year
(74388dd24b).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
The i965 backends pass something out of 'screen', which is allocated
per-process, making using this as a ralloc context not thread-safe.
All callers ra_alloc_interference_graph() already ralloc_free() its
return value.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It was totally broken:
- p_atomic_dec_zero() was returning the negation of the expected value
- p_atomic_inc_return()/p_atomic_dec_return() was
post-incrementing/decrementing, hence returning the old value instead
of the new
- p_atomic_cmpxchg() was returning the new value on success, instead of
the old
It is clear this never used in the past. I wonder if it wouldn't be better to
yank it altogether.
Reviewed-by: Matt Turner <mattst88@gmail.com>
It was much easier for me to verify things build and run as expected
with this simple test, than building and testing whole Mesa.
With scons the test can be build and run merely by doing:
scons u_atomic_test
Building the test with autotools is left as a future exercise.
Reviewed-by: Matt Turner <mattst88@gmail.com>
like how C11's stdatomic.h provides generic functions. GCC's __sync_*
builtins already take a variety of types, so that's simple.
MSVC and Sun Studio don't, but we can implement it with something that
looks a little crazy but is actually quite readable.
Thanks to Jose for some MSVC fixes!
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This doesn't reschedule much currently, just tries to fit things into the
regfile A/B write-versus-read slots (the cause of the improvements in
shader-db), and hide texture fetch latency by scheduling setup early and
results collection late (haven't performance tested it). This
infrastructure will be important for doing instruction pairing, though.
shader-db2 results:
total instructions in shared programs: 61874 -> 59583 (-3.70%)
instructions in affected programs: 50677 -> 48386 (-4.52%)
We're supposed to be checking that nothing else writes r4, which is done
by the TMU result collection signal, not the coordinate setup.
Avoids a regression when QPU instruction scheduling is introduced.
Otherwise vertex shader can see stale cache data. This in particular
happens when the same vbo is updated and reused. Not sure yet if vbo's
at differing addresses but bound to same vertex buffer slot could have
issues, but seems safest to flush whenever new vertex buffers are bound.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For drivers building up to GL(ES)3, only expose the actual extension if
the API will let it be used (e.g. via overrides/debug flags that enable
higher versions).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The mesa state tracker doesn't fall back on similar integer formats, so
they must all be provided. Remove the restriction against integer color
rendering.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We need to produce a u32 destination type on integer sampling
instructions, so keep that in a shader key set based on the
currently-bound textures.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Integer outputs end up getting mangled due to cov.f32f16, and float32
loses precision. Use full precision shaders in both of those cases.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Just pass the data through unmolested. This probably has no effect since
blending isn't actually enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The table contains all the relevant information about each format. The
helper functions now just do lookups in the table.
Note that this adds support for a lot of formats that were previously
unsupported. Additionally it adds disabled support for integer render
buffers, which will require more work to actually enable.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Switch both of them from independently inconsistent conventions to having
UINT/SINT/UNORM/SNORM/FLOAT/FIXED suffixes.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
BRW_CACHE_VS_PROG is more easily associated with program caches than
plain BRW_VS_PROG.
While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away
from the outdated Windowizer/Masker name.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This flag signifies that we've emitted a new SAMPLER_STATE table.
Given that we haven't cached those in years, CACHE_NEW_SAMPLER isn't
a great name. Putting it in the BRW_NEW_* hierarchy would make more
sense; BRW_NEW_SAMPLER_STATE_TABLE better reflects its actual purpose.
When this flag is raised, the pointer to the SAMPLER_STATE table has
changed, so we need to re-issue any packets which point to it (unit
state on Gen4-5, 3DSTATE_SAMPLER_STATE_POINTERS on Gen6, and the
per-stage variants on Gen7+).
Saves 2 * sizeof(void *) bytes per context, as we remove useless
aux_compare/aux_free function pointers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Marking brw_stage_state::sampler_count as CACHE_NEW_SAMPLER is wrong.
The number of samplers used by each program is actually computed at
draw time (brw_try_draw_prims), based purely on the currently bound
shader programs (gl_program::SamplersUsed).
CACHE_NEW_SAMPLER means that we've emitted a new SAMPLER_STATE table.
Although this could indicate that the number of samplers has changed,
it could also simply mean that the contents of the table has changed
(i.e. we've bound different textures).
The real reason these atoms depend on CACHE_NEW_SAMPLER is because they
include a pointer to the SAMPLER_STATE table. This was not commented.
So, move the comments to the appropriate place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We've been streaming these out for ages, so they basically have nothing
to do with brw_state_cache.c.
Saves 6 * sizeof(void *) bytes per context, as we won't have useless
aux_compare/aux_free functions for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These always happen together; the extra atom just means another item to
iterate through, flags to check, and a call through a function pointer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen4-5, unit state is specified as indirect state, rather than
commands. If any unit state changes, we upload it via brw_state_batch
and arrange for 3DSTATE_PIPELINED_POINTERS to be re-emitted, which
updates pointers to all unit state at once.
Since there's only one command and state atom (brw_psp_urb_cs) that
needs to know about this, there's no benefit to having six separate
flags. We can combine CACHE_NEW_*_UNIT into a single flag.
We also haven't cached these in a long time, so it doesn't make sense
to use the "CACHE_NEW_" prefix. Instead, use the "BRW_NEW_" prefix.
This also saves 12 * sizeof(void *) bytes of memory per context, as
we remove useless aux_compare/aux_free functions for each CACHE bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of the dirty flags were listed in some arbitrary order. Some used
bonus parenthesis. Some put multiple flags on one line, others put one
per line. Some used tabs instead of spaces...but only on some lines.
This patch settles on one flag per line, in alphabetical order, using
spaces instead of tabs, and sheds the unnecessary parentheses.
Sorting was mostly done with vim's visual block feature and !sort,
although I alphabetized short lists by hand; it was pretty manual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When using MRT on Gen4-5, we have to simulate GL's alpha test feature
by emitting discards in the fragment shader. In this case, it makes
sense to set prog_data->uses_kill, which means the fragment shader may
kill pixels via the discard mechanism.
This saves us from having to look an extra key value in a couple of
places, including in the generator.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This means the generator doesn't have to look at the key, which is a
little nicer - we're pretty close to no key dependencies at all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kristian noted that there's very little use of brw_wm_prog_key in the
generator, and that it basically just generates what it's told, without
caring about what stage it's handling.
One exception to this is derivative handling. When handling dFdxCoarse
and dFdxFine, we packed an enum value in a second source register,
explicitly telling the generator what to do. For dFdx, we specified an
enum value of "please use the hint", then checked the program key in the
generator level code.
A natural method is to define separate FS_OPCODE_DD[XY]_{COARSE,FINE}
opcodes, and have the front-end (which already decides what IR to
generate based on the program key) decide which dPdx/dPdy should
correspond to. This consolidates the decision making in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
prog_data->foo is a bit more readable than brw->wm.prog_data->foo.
The local variable definition is also a great location to put the
obligatory /* CACHE_NEW_WM_PROG */ comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
brw->wm.prog_data is covered by CACHE_NEW_WM_PROG, not
BRW_NEW_FRAGMENT_PROGRAM. So, we should listen to it.
However, I believe that BRW_NEW_FRAGMENT_PROGRAM is sufficient to cover
all the necessary cases - CACHE_NEW_WM_PROG happens in a subset of
cases. So, the code being wrong shouldn't have triggered bugs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Right now, in mesamatrix.net, the footnote is set so that it seems to be
for all the features, while actually it only applies to MSAA.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Flex and lex have a special action ‘|’ which means to use the same action as
the next rule. We can use this to reduce a bit of code duplication in the
rules for the various float literal formats.
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to the GLSL spec float literals like ‘1f’ shouldn't be allowed
without adding a decimal point or an exponent. Apparently the AMD driver also
disallows this so it seems unlikely that anything would be relying on it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We are using 1 more buffer than we have, although in the future the
driver should just end up using one buffer in total probably, this
is a good first step, it merges the txq cube array and buffer info
constants on r600 and evergreen.
This should in theory fix geom shader tests on r600.
v1.1: fix comments from Glenn.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just use the same entrypoints we use for st/wgl's opengl32.dll.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's not exported by the official opengl32.dll neither. Applications are
supposed to get it via wglGetProcAddress(), not GetProcAddress().
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes several MSVC warnings like:
warning C4273: 'glClearColorx' : inconsistent dll linkage
In fact, we should avoid using `declspec(dllexport)` altogether, and use
exclusively the .DEF instead, which gives more precise control of which
symbols must be exported, but all the public GL/GLES headers practically
force us to pick between `declspec(dllexport)` or
`declspec(dllimport)`.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Addresses MSVC warnings "result of 32-bit shift implicitly converted to
64 bits (was 64-bit shift intended?)", which can often be symptom of
bugs, but in these cases were all benign.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
- Remove no-op if-clause.
- -mstackrealign has been enabled again on MinGW for quite some time and
appears to work alright nowadays.
- Drop -mmmx option as it is implied my -msse, and we don't use MMX
intrinsics anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
PIPE_CAP_VIDEO_MEMORY returns the amount of video memory in megabytes,
so need to converted it to bytes.
Fixed Warframe memory detection.
v2: also prepare for cards with more than 4GB memory
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Yaroslav Andrusyak <pontostroy@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
D3DPOOL_SCRATCH is disallowed according to spec.
D3DPOOL_SYSTEMMEM should be allowed but we don't handle it right for now.
v2: Fixes segfault in SetTexture when unsetting the texture
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Fixes "Error : CONST[20]: Undeclared source register" when running
dx9_alpha_blending_material. Also artifacts on ilo.
v2: also remove unused MISC_CONST
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
This patch moves the data field from Resource9 to Surface9 and cleans
D3DPOOL_SYSTEMMEM handling in Texture9. This fixes HL2 lost coast.
It also removes in Texture9 some code written to support importing
and exporting non D3DPOOL_SYSTEMMEM shared buffers. This code hadn't
the design required to support the feature and wasn't used.
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Rather than shoving all the VL code for non-VL targets, increasing
their size, just split it out and use it when needed. This gives us
the side effect of building vl_winsys_dri.c once, dropping a few
automake warnings, and reducing the size of the dri modules as below
text data bss dec hex filename
5850573 187549 1977928 8016050 7a50b2 before/nouveau_dri.so
5508486 187100 391240 6086826 5ce0aa after/nouveau_dri.so
The above data is for a nouveau + swrast + kms_swrast 'megadriver'.
v2: Do not include the vl sources in the auxiliary library.
v3: Rebase. Add nine.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With follow up commit we'll split vl static lib from the auxiliary one,
and choose the appropriate vl (galliumvl or galliumvl_stub) for the
respective targets to link against.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Will be used by the non-VL targets, to stub out the functions called
by the drivers. The entry point to those are within the VL
state-trackers, yet the compiler cannot determine that at link time.
Thus we'll need to stub them out to prevent unresolved symbols in the
dri, egl, gbm and pipe-loader targets.
v2: Rebase.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Set a single VL_{CFLAG,LIBS} for xcb and friends, and let each target
check for it's relevant library alone. Required as with follow up
commits we'll build aux/vl into a separate module, which needs VL_CFLAGS
Cleanup add a couple of explicit LIBDRM_LIBS linking, as aux/vl itself
requires libdrm, despite that LIBDRM_{RADEON,NOUVEAU...} may provide it
as well.
v2: Rebase. Make sure st/xvmc programs work.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Or we might end up where automatically enable the build, only to error
out a couple of lines after that.
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This will remove the need for unnecessary runtime checks for CPU features if
already supported by target CPU, resulting in smaller and less branchy code.
V2:
- Removed the SSSE3 related part for the not yet merged patch.
- Avoiding redefinition of macros.
Tested-by: David Heidelberg <david@ixit.cz>
Since pack_bytes expands to two mov(4) align1 instructions, we can't use
swizzles directly. For an instruction like
pack_bytes m4.y:UD, vgrf13.xyzw:UD
we can write into the .y component by settings the offset based on the
swizzle.
Also while we're doing this, we can set the dependency control hints
properly, so that a series of pack_bytes writing into separate
components of a register can issue without blocking.
Fixes broken rendering in Windows-based QtQuick2 apps run through Wine.
This library sets all texture units' GL_COORD_REPLACE, leaves point
sprite mode enabled, and then draws a triangle fan.
Will need a slightly different fix for Gen4-5, but I don't have my old
machines in a usable state currently.
V2: - Simplify patch -- the real changes are no longer duplicated across
the Gen6 and Gen7 atoms.
- Also don't clobber attr overrides -- which matters on Haswell too,
and fixes the other half of the problem
- Fix newly-introduced warnings
V3: - Use BRW_NEW_GEOMETRY_PROGRAM and brw->geometry_program rather than
core flag and state; keep the state flags in order.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84651
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ilia noticed that my lowering pass was converting the constant array
used by textureGatherOffsets' offsets parameter to a uniform. This
broke textureGather for Nouveau, and is generally a horrible plan,
since it violates the GLSL constraint that offsets must be an
immediate constant.
When I wrote this pass, I neglected to consider whole array assignment.
I figured opt_array_splitting would handle constant indexing, so this
pass was really about fixing variable indexing.
textureGatherOffsets is an example of whole array access that we really
don't want to touch. Whole array copies don't appear to benefit from
this either - they're most likely initializers for temporary arrays
which are going to be mutated anyway. Since you're copying, you may
as well copy from immediates, not uniforms.
This patch makes the pass look for ir_dereference_arrays of
ir_constants, rather than looking for any ir_constant directly.
This way, it ignores whole array assignment.
No shader-db changes or Piglit regressions on Haswell. Some Piglit
tests generate different code (fixing textureGatherOffsets on Nouveau).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
We already precompile GLSL programs; it seems logical to precompile ARB
programs as well. We just never hooked it up.
This also makes the programs compile even if no drawing occurs, which is
useful for shader-db.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, the prototypes for brw_vs/gs/fs_precompile were scattered
between brw_vs.h (C), brw_gs.h (C), and brw_fs.h (C++ only). Also,
brw_fs_precompile had C++ linkage, while the others were C.
This patch moves all the prototypes to a central location (brw_shader.h)
and makes brw_fs_precompile have C linkage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We'd like to do precompiling for ARB vertex and fragment programs,
which only have gl_program structures - gl_shader_program is NULL.
This patch makes the various precompile functions take a gl_program
parameter directly, rather than accessing it via gl_shader_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_shader_precompile should just do a precompile; it makes more sense
for the caller to decide whether we should do one. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
llvmpipe disables denorms on purpose (on x86/sse only), because denorms are
generally neither required nor desired for graphic apis (and in case of d3d10,
they are forbidden).
However, this caused some arithmetic tests using denorms to fail on some
systems, because the reference did not generate the same results anymore.
(It did not fail on all systems - behavior of these math functions is sort
of undefined when called with non-standard floating point mode, hence the
result differing depending on implementation and in particular the sse
capabilities.)
So, for the reference, simply flush all (input/output) denorms manually
to zero in this case.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=67672.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The extension itself was deleted 2 years ago. There are still some
prog_instruction opcodes from NV_fp that exist because they're used by
ir_to_mesa.cpp, though.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
They're part of NV_vertex_program2, which I'm pretty sure we're never
going to support.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Roamnick <ian.d.romanick@intel.com>
The nice thing about the good way of initializing arrays like this is that
you don't need to initialize everything in order, or even everything at
all. Taking advantage of that only needs a tiny fixup to deal with the
default NULL value of the pointers.
I haven't dropped the initialization of opcodes that exist and are unsupported.
This was the only state tracker emitting it, and hardware was just having
to lower it anyway (or failing to lower it at all).
v2: Extracted from a larger patch by Jose (which also dropped DP2A), fixed
to actually not reference TGSI_OPCODE_CND. Change by anholt.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
The translation is lowering it to not using TGSI_OPCODE_NRM, anyway.
v2: Extracted from a larger patch by Jose that also dropped DP2A usage.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Caught by clang.
warning: comparison of constant -1 with expression of type
'ir_texture_opcode' is always false
[-Wtautological-constant-out-of-range-compare]
if (op == -1)
~~ ^ ~~
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts a little more than 1k of .text size from i915g.
This was previously done in commit 5f66b340 and subsequently reverted in
commit 3661f757 after bug 30514 was filed. I believe the cause of bug
30514 wasn't anything related to cross compiling, but rather that the
toolchain used defaulted to -march=i386, and i386 doesn't have the
CMPXCHG or XADD instructions used to implement the intrinsics.
So we reverted a patch that improved things so that we didn't break
compilation for a platform that never could have worked anyway.
Ben was asking about the undocumented restriction that the math
instruction cannot use the dependency control hints. I went to reconfirm
and disabled the is_math() check in opt_set_dependency_control() and saw
that the disassembled math instructions with dependency hints had a
bogus math function. We were mistakenly overwriting it by setting an
empty conditional mod.
Unfortunately, this wasn't the cause of the aforementioned problem (I
reproduced it). This bug is benign, since we don't set dependeny hints
on math instructions -- but maybe some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These were added in commits a760c738 and 43757135 to be used in
implementing C-style aggregate initializers (commit 1b0d6aef). Paul
rewrote that code in commit 0da1a2cc to use GLSL types, rather than
AST types, leaving these copy constructors unused.
Tested by making them private and providing no definition.
Uniform names (even for hidden uniforms) are required to be unique; some
parts of the compiler assume they can be looked up by name.
Fixes the piglit test: tests/spec/glsl-1.20/linker/array-initializers-1
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When converting a uniform array reference to a pull constant load, the
`reladdr` expression itself may have its own `reladdr`, arbitrarily
deeply. This arises from expressions like:
a[b[x]] where a, b are uniform arrays (or lowered const arrays),
and x is not a constant.
Just iterate the lowering to pull constants until we stop seeing these
nested. For most shaders, there will be only one pass through this loop.
Fixes the piglit test:
tests/spec/glsl-1.20/linker/double-indirect-1.shader_test
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves all the CUBE section above the gradients section,
so that the gradient emission happens on one block which
is what sb/hardware expect.
v2: avoid changes to bytecode by using spare temps
v2.1: shame gcc, oh the shame. (uninit var warnings)
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The piglit tests were failing, and it appeared to be SB
optimising out things, but Glenn pointed out the gradients
are meant to be clause local, so we should emit the texture
instructions in the same clause. This moves things around
to always copy to a temp and then emit the texture clauses
for H/V.
v2: Glenn pointed out we could get another ALU fetch in
the wrong place, so load the src gpr earlier as well.
Fixes at least:
./bin/tex-miplevel-selection textureGrad 2D
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
res->bind is not an indicator of how the resource is currently bound.
buffers can be rebound across different binding points without changing
underlying storage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
This creates a usable layout for all NPOT textures. Of course these
still have lots of limitations, but at least we can render to a
level.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
For NPOT texture layouts, we want to be able to access texture levels
other than 0 directly. Since the hw doesn't support that, We do it by
adding the offset directly.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This happens with glsl-convolution-1, where we have 64 constants. This
doesn't make the test pass (we don't have 64 constants anyway, only
32) but this prevents it from crashing.
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
This patch remove workaround related to LLVM < 3.2 bug.
Original bug has been closed as fixed in 2011.
At this moment gallium requires LLVM 3.3 (2013).
LLVM has been tested without SSE2 support in commit
ca70de9bd2 and removed after requiring
LLVM 3.3 in commit 013ff2fae1
Original LLVM bug: http://llvm.org/bugs/show_bug.cgi?id=6960
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In commit 5e37a2a4a8, I made the pull constant code stop calling
_mesa_load_state_parameters() when there were no pull parameters.
This worked fine on Gen6+ because the push constant code also called
it if there were any push constants. However, the Gen4-5 push constant
code wasn't doing this. This patch makes it do so, like the Gen6+ code.
A better long term solution would be to make core Mesa just handle this
for us when necessary.
Fixes around 8766 Piglit tests on Ironlake, and probably Gen4 as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Fix one of the few cases where we can't reliable touch the destination hazard
bits. I am explicitly doing this patch individually so it is easy to backport. I
was tempted to do this patch before the previous patch which reorganized the
code, but I believe even doing that first, this is still easy to backport.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Move this to a separate function so that we can begin to add other little
caveats without making too big a mess.
NOTE: There is some desire to improve this function eventually, but we need to
fix a bug first.
v2:
Use const for the inst for the hazard check (Matt)
Invert safe logic to get rid of the double negative (Matt)
Add PRM reference for predicates (Matt)
Add note about empirical evidence for math (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The visitor emits MOVs to temporary registers for immediates, so these
never trigger. For further proof, check case ir_triop_fma.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
texture_offset was only used by some texturing operations, and offset
was only used by spill/unspill and some URB operations. These fields are
never used at the same time.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Check that the target is GL_TEXTURE_CUBE_MAP before emitting
TEXCOORDTYPE_VECTOR texture coordinates.
I'm not sure if the hardware would like CARTESIAN coordinates
with cube maps, and as I'm too lazy to find out just emit the
VECTOR coordinates for cube maps always. For other targets use
CARTESIAN or HOMOGENOUS depending on the number of texture
coordinates provided.
Fixes rendering of the "electric" background texture in chromium-bsu
main menu. We appear to be provided with three texture coordinates
there (I'm guessing due to the funky texture matrix rotation it does).
So the code would decide to use TEXCOORDTYPE_VECTOR instead of
TEXCOORDTYPE_CARTESIAN even though we're dealing with a 2D texure.
The results weren't what one might expect.
demos/cubemap still works, which hopefully indicates that this doesn't
break things.
Also tested with:
bin/glean -o -v -v -v -t +texCube --quick
bin/cubemap -auto
from piglit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Add support for decoding the new branch control bit. I saw two things wrong with
the existing code.
1. It didn't bother trying to decode the bit.
- While we do not *intentionally* emit this bit today, I think it's interesting
to see if we somehow ended up with the bit set. It may also be useful in the
future.
2. It seemed to be the wrong bit.
- The docs are pretty poor wrt which bit this actually occupies. To me, it
/looks/ like it should be bit 28. I am not sure where Ken got 30 from. I
verified it should be 28 by looking at the simulator code.
I also added the most basic support for GOTO simply so we don't need to remember
to change the function in the future.
v2:
Move the branch_ctrl check out of the if gen >= 6 check to make it more
readable. (Matt)
ENDIF doesn't have branch_ctrl (Matt + Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts f4dd099171.
The src/gallium/tests/unit/translate_test.c gives the same results on
MinGW 64-bits as on Linux 64-bits. And since MinGW is often used for
development/testing due to its convenience, it's better not to have this
sort of differences relative to MSVC.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
No changes required in the driver itself, all handled by draw.
piglit results in a quick run:
skip->pass 7
skip->fail 2
(The new failures in the ARB_fragment_layer_viewport group are expected,
we fail the same if gs doesn't write these outputs regardless of the vs.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Mostly add a couple cases so we don't just check gs for this.
There's only one gotcha, the built-in vp transform in the llvm vs can't
handle it (this would be fixable though non-trivial due to vp index being
non-constant for the SoA outputs, but we don't use it if there's a gs
neither - the whole clip/vp transform integration there is suboptimal).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When calling vaCreateImage() an internal copy of VAImage is maintained
since the allocation of "image" may not be guaranteed to live long enough.
Signed-off-by: Michael Varga <Michael.Varga@amd.com>
For 1D and 2D arrays we don't want the other coordinates being
offset and affecting where we sample. I wrote this patch 6 months
ago but lost it.
Fixes:
./bin/tex-miplevel-selection textureLodOffset 1DArray
./bin/tex-miplevel-selection textureLodOffset 2DArray
./bin/tex-miplevel-selection textureOffset 1DArray
./bin/tex-miplevel-selection textureOffset 1DArrayShadow
./bin/tex-miplevel-selection textureOffset 2DArray
./bin/tex-miplevel-selection textureOffset(bias) 1DArray
./bin/tex-miplevel-selection textureOffset(bias) 2DArray
v2: rewrite to handle more cases and be consistent with code
above.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, the kernel would dispatch thread 0, wait, then dispatch thread
1. By insisting that the thread contents use semaphores in the right
place, the kernel can sleep for longer by dispatching both threads at
once.
When using the stand alone compiler, if we try to link a shader with vertex
attributes it will segfault on linking as the binding hash tables are not
included in the shader program. Obviously, we cannot make the linking stage
succeed without the bound attributes but we can prevent the crash and just
let the linker spit its own error.
Reviewed-by: Brian Paul <brianp@vmware.com>
We cannot guarantee that vertex buffers have the necessary alignment for
fetching all AoS members at once (for instance 4x32bit XYZW data). We can
however guarantee that for textures. This did not cause errors for older
llvm versions but it now matters and will cause segfaults if the data
happens to not be aligned. Thus we need to set alignment manually.
(Note that we can't actually really guarantee data to be even element aligned
due to offsets in vertex buffers being bytes and OpenGL allowing this, but
it does not matter for x86 as alignment is only required for sse vectors -
not sure what happens on other archs, however.)
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=85467.
Not all drivers can set gl_Layer from VS. Add a fallback that passes the
instance id from VS to GS, and then uses the GS to set the layer.
Tested by adding
quad_buffers |= clear_buffers;
clear_buffers = 0;
to the st_Clear logic, and forcing set_vertex_shader_layered in all
cases. No piglit regressions (on piglits with 'clear' in the name).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
DRI_PRIME setups have different issues due the lack of dma-buf fences
support in the drivers. For DRI3 DRI_PRIME, a race can appear, making
tearings visible, or worse showing older content than expected. Until
dma-buf fences are well supported (and by all drivers), an alternative
is to send the buffers to the server only when rendering has finished.
Since waiting the rendering has finished in the main thread has a
performance impact, this patch uses an additional thread to offload the
wait and the sending of the buffers to the server.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Implements vblank_mode and throttling, which allows us change default ratio
between framerate and input lag.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Work of Joakim Sindholt (zhasha) and Christoph Bumiller (chrisbmr).
DRI3 port done by Axel Davy (mannerov).
v2: - nine_debug.c: klass extended from 32 chars to 96 (for sure) by glennk
- Nine improvements by Axel Davy (which also fixed some wine tests)
- by Emil Velikov:
- convert to static/shared drivers
- Sort and cleanup the includes
- Use AM_CPPFLAGS for the defines
- Add the linker garbage collector
- Restrict the exported symbols (think llvm)
v3: - small nine fixes
- build system improvements by Emil Velikov
v4: [Emil Velikov]
- Do no link against libudev. No longer needed.
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: David Heidelberg <david@ixit.cz>
v3: thanks to Brian, improved coding style, also glennk helped spot few
things (unsigned -> int, two constify)
v4: thanks Ilia improved function, dropped u_box_clip_3d
v5: incorporated rest of Gregor proposed changes,clean ups
v6: u_box_clip_2d simplify proposed by Ilia Mirkin
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
At this moment we use only zero or positive values.
v2: Implement it for also for Solaris, MSVC assembly
and enable for other combinations.
v3: Replace MSVC assembly by assert + warning during compilation
v4: remove inc and dec with return for MSVC assembly
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
Implement pipe_loader_sw_probe_wrapped which allows to use the wrapped
software renderer backend when using the pipe loader.
v2: - remove unneeded ifdef
- use GALLIUM_PIPE_LOADER_WINSYS_LIBS
- check for CALLOC_STRUCT
thanks to Emil Velikov
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Some of the geom shader tests produce an empty vertex shader,
on cayman we'd crash in the finaliser because last_cf was NULL.
cayman doesn't need the NOP workaround, so if the code arrives
here with no last_cf, just emit an END.
fixes crashes in a bunch of piglit geom shader tests.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The sampler_array_size field was added by "mesa/st: add support for
dynamic sampler offsets". But the field wasn't getting copied in
the get_pixel_transfer_visitor() or get_bitmap_visitor() functions.
The count_resources() function then didn't properly compute the
glsl_to_tgsi_visitor::samplers_used bitmask. Then, we didn't declare
all the sampler registers in st_translate_program(). Finally, we
asserted when we tried to emit a tgsi ureg src register with File =
TGSI_FILE_UNDEFINED.
Add the missing assignments and some new assertions to catch the
invalid register sooner.
Cc: "10.3, 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Using the asynchronous DMA engine for multi-dimensional operations seems
to cause random GPU lockups for various people. While the root cause for
this might need to be fixed in the kernel, let's disable it for now.
Before re-enabling this, please make sure you can hit all newly enabled
paths in your testing, preferably with both piglit and real world apps,
and get in touch with people on the bug reports below for stability
testing.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Unfortunately no LLVM type was generated for pipe_viewport_state -- it
was being treated as a single floating point array --, so llvmpipe (and
any driver that relies on draw/llvm) got totally busted.
We were missing a few files
- The version scripts
- Android & scons build scripts
- A few headers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
- Add all headers into Makefile.sources
- Don't forget the target-helpers
- Add the python scripts & the formats table/list (csv)
- Temporary add vl/vl_winsys_dri.c to EXTRA_DIST until we rework the
way VL is build.
- Add the following to EXTRA_DIST - they are included via the
generated u_indices_gen.c thus we should not add them to *SOURCES.
indices/u_indices.c
indices/u_unfilled_indices.c
XXX: Should we nuke gallivm/f.cpp ? It seems that no-one is using it.
v2: Rebase
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The DRM_IOCTL_MODE_CREATE_DUMB (and others) IOCTL isn't very rigorously
specified, which has the effect that some kernel drivers do not consider
the .pitch and .size fields of struct drm_mode_create_dumb outputs only.
Instead they will use these as lower bounds and overwrite them only if
the values that they compute are larger than what userspace provided.
This works if and only if userspace initializes the fields explicitly to
either 0 or some meaningful value. However, if userspace just leaves the
values uninitialized and the struct drm_mode_create_dumb is allocated on
the stack for example, the driver may try to overallocate buffers.
Fortunately most userspace does zero out the structure before passing it
to the IOCTL, but there are rare exceptions. Mesa is one of them. In an
attempt to rectify this situation, kernel drivers are being updated to
not use the .pitch and .size fields as inputs. However in order to fix
the issue with older kernels, make sure that Mesa always zeros out the
structure as well.
Future IOCTLs should be more rigorously defined so that structures can
be validated and IOCTLs rejected if output fields aren't set to zero.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Addition of color fmt bitfield to this register (compared to a3xx) means
we need to re-emit if either prog or framebuffer state is dirty.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This reverts commit 8d3f739383.
In the last commit we've updated our check to determine if the actual
code is buildable, rather than if the compiler acknowledges the option.
I.e. did anyone provide -mno-sse4.1 vs is my compiler too old.
Now this code will never be attemped to be build, in both cases.
Confirmed by building mesa with
export CFLAGS='-march=native -mno-sse4.1'
./configure && make
Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
So when checking/building sse code we have three possibilities:
1 Old compiler, throws an error when using -msse*
2 New compiler, user disables sse* (-mno-sse*)
3 New compiler, user doesn't disable sse
The original code, added code for #1 but not #2. Later on we patched
around the lack of handling #2 by wrapping the code in __SSE4_1__.
Yet it lead to a missing/undefined symbol in case of #1 or #2, which
might cause an issue for #2 when using the i965 driver.
A bit later we "fixed" the undefined symbol by using #1, rather than
updating it to handle #2. With this commit we set things straight :)
To top it all up, conventions state that in case of conflicting
(-enable-foo -disable-foo) options, the latter one takes precedence.
Thus we need to make sure to prepend -msse4.1 to CFLAGS in our test.
v2: Clean the #includes. Suggested by Ilia, Matt & Siavash.
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Siavash Eliasi <siavashserver@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Very initial support. Basic stuff working (es2gears, es2tri, and maybe
about half of glmark2). Expect broken stuff. Still missing: mem->gmem
(restore), queries, mipmaps (blob segfaults!), hw binning, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85529">Bug 85529</a> - Surfaces not drawn in Unvanquished</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=87619">Bug 87619</a> - Changes to state such as render targets change fragment shader without marking it dirty.</li>
</ul>
<h2>Changes</h2>
<p>Chad Versace (2):</p>
<ul>
<li>i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()</li>
<li>i965: Use safer pointer arithmetic in gather_oa_results()</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: Add sha256 sums for the 10.3.6 release</li>
<li>Update version to 10.3.7</li>
</ul>
<p>Ilia Mirkin (2):</p>
<ul>
<li>nv50,nvc0: set vertex id base to index_bias</li>
<li>nv50/ir: fix texture offsets in release builds</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.</li>
<li>i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.</li>
</ul>
<p>Marek Olšák (3):</p>
<ul>
<li>glsl_to_tgsi: fix a bug in copy propagation</li>
<li>vbo: ignore primitive restart if FixedIndex is enabled in DrawArrays</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88658">Bug 88658</a> - (bisected) Slow video playback on Kabini</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89069">Bug 89069</a> - Lack of grass in The Talos Principle on radeonsi (native\wine\nine)</li>
</ul>
<h2>Changes</h2>
<p>Carl Worth (1):</p>
<ul>
<li>Revert use of Mesa IR optimizer for ARB_fragment_programs</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.4 release</li>
<li>get-pick-list.sh: Require explicit "10.4" for nominating stable patches</li>
<li>Update version to 10.4.5</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: bail out of 2d blits with non-A8_UNORM alpha formats</li>
<li>st/mesa: treat resource-less xfb buffers as if they weren't there</li>
<li>nvc0: allow holes in xfb target lists</li>
</ul>
<p>Jeremy Huddleston Sequoia (2):</p>
<ul>
<li>darwin: build fix</li>
<li>darwin: build fix</li>
</ul>
<p>Kenneth Graunke (4):</p>
<ul>
<li>i965: Override swizzles for integer luminance formats.</li>
<li>i965: Use a gl_color_union for sampler border color.</li>
<li>i965: Fix integer border color on Haswell.</li>
<li>glsl: Reduce memory consumption of copy propagation passes.</li>
</ul>
<p>Laura Ekstrand (1):</p>
<ul>
<li>main: Fixed _mesa_GetCompressedTexImage_sw to copy slices correctly.</li>
</ul>
<p>Marek Olšák (5):</p>
<ul>
<li>r600g,radeonsi: don't append to streamout buffers that haven't been used yet</li>
<li>radeonsi: fix instanced arrays with non-zero start instance</li>
<li>radeonsi: small fix in SPI state</li>
<li>mesa: fix AtomicBuffer typo in _mesa_DeleteBuffers</li>
<li>radeonsi: fix a crash if a stencil ref state is set before a DSA state</li>
</ul>
<p>Michel Dänzer (2):</p>
<ul>
<li>st/mesa: Don't use PIPE_USAGE_STREAM for GL_PIXEL_UNPACK_BUFFER_ARB</li>
<li>Revert "radeon/llvm: enable unsafe math for graphics shaders"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=88885">Bug 88885</a> - Transform feedback uses incorrect interleaving if a previous draw did not write gl_Position</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89180">Bug 89180</a> - [IVB regression] Rendering issues in Mass Effect through VMware Workstation</li>
</ul>
<h2>Changes</h2>
<p>Abdiel Janulgue (2):</p>
<ul>
<li>glsl: Don't optimize min/max into saturate when EmitNoSat is set</li>
<li>st/mesa: For vertex shaders, don't emit saturate when SM 3.0 is unsupported</li>
</ul>
<p>Andreas Boll (1):</p>
<ul>
<li>glx: Fix returned values of GLX_RENDERER_PREFERRED_PROFILE_MESA</li>
</ul>
<p>Brian Paul (2):</p>
<ul>
<li>swrast: fix multiple color buffer writing</li>
<li>st/mesa: fix sampler view reference counting bug in glDraw/CopyPixels</li>
</ul>
<p>Chris Forbes (1):</p>
<ul>
<li>i965/gs: Check newly-generated GS-out VUE map against correct stage</li>
</ul>
<p>Eduardo Lima Mitev (1):</p>
<ul>
<li>mesa: Fix error validating args for TexSubImage3D</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.5 release</li>
<li>install-lib-links: remove the .install-lib-links file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=79202">Bug 79202</a> - valgrind errors in glsl-fs-uniform-array-loop-unroll.shader_test; random code generation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89156">Bug 89156</a> - r300g: GL_COMPRESSED_RED_RGTC1 / ATI1N support broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89224">Bug 89224</a> - Incorrect rendering of Unigine Valley running in VM on VMware Workstation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89530">Bug 89530</a> - FTBFS in loader: missing fstat</li>
</ul>
<h2>Changes</h2>
<p>Andrey Sudnik (1):</p>
<ul>
<li>i965/vec4: Don't lose the saturate modifier in copy propagation.</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>egl: Take alpha bits into account when selecting GBM formats</li>
</ul>
<p>Emil Velikov (6):</p>
<ul>
<li>docs: Add sha256 sums for the 10.4.6 release</li>
<li>cherry-ignore: add not applicable/rejected commits</li>
<li>mesa: rename format_info.c to format_info.h</li>
<li>loader: include <sys/stat.h> for non-sysfs builds</li>
<li>auxiliary/os: fix the android build - s/drm_munmap/os_munmap/</li>
<li>Update version to 10.4.7</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>i965: Fix out-of-bounds accesses into pull_constant_loc array</li>
</ul>
<p>Ilia Mirkin (4):</p>
<ul>
<li>freedreno: move fb state copy after checking for size change</li>
<li>freedreno/ir3: fix array count returned by TXQ</li>
<li>freedreno/ir3: get the # of miplevels from getinfo</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.