Silences many GCC warnings of the form:
drivers/common/meta.c: In function 'cleanup_temp_texture':
drivers/common/meta.c:1208:41: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_blit_framebuffer':
drivers/common/meta.c:1453:46: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_blit_cleanup':
drivers/common/meta.c:1998:43: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_clear_cleanup':
drivers/common/meta.c:2287:44: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'setup_ff_generate_mipmap':
drivers/common/meta.c:3365:45: warning: unused parameter 'ctx' [-Wunused-parameter]
drivers/common/meta.c: In function 'meta_glsl_generate_mipmap_cleanup':
drivers/common/meta.c:3556:54: warning: unused parameter 'ctx' [-Wunused-parameter]
There are a couple other similar warnings, but they are less trivial. I
want to investigate these further before axing them.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Array textures can't be used with fixed-function, so don't. Instead,
just drop the decompress request on the floor. This is no worse than
what was done previously because generating the GL error (in
_mesa_set_enable) broke everything anyway.
A later patch will get GL_TEXTURE_2D_ARRAY targets working.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no need to use pixel coordinates, and using NDC directly will
simplify the GLSL paths.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
For these objects, meta was already using the non-Apple function to
delete the objects. Everywhere else in the file uses
_mesa_GenVertexArrays and _mesa_BindVertexArrays.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
The hardware decompression path isn't even close to being able to handle
this. This converts the crash (assertion failure) in
"EXT_texture_compression_s3tc/getteximage-targets S3TC CUBE_ARRAY" to a
plain old failure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
_mesa_meta_DrawPixels creates a VAO and (potentially) two fragment
programs, but none of them are ever released. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
decompress_texture_image creates an FBO, an RBO, a VBO, a VAO, and a
sampler object, but none of them are ever released. Later patches will
add program objects, exacerbating the problem. Leaking piles of memory
is generally frowned upon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
TEXTURE_BUFFER_INDEX has to be specially called out because it is not
allowed in any of the glTexParameter or glGetTexParameter functions.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The next patch will use this function in another file.
v2: Rename _mesa_target_enum_to_index to _mesa_tex_target_to_index.
Suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glTexSubImage(), glCopyTexSubImage() and glCompressedTexSubImage()
only change the texel data, not other state like texture size or format.
If a driver really needs do something special it can hook into the
corresponding driver functions or Map/UnmapTextureImage().
This should avoid some needless state validation effort.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The _mesa_get_current_tex_object() function is now used everywhere that
_mesa_select_tex_object() was formerly used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As far as I know, this should be safe. If not, we have to decide whether
to have variable lookup of the functions, or just drop support for .so.0
(which is a year and a half old it looks like)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74127
Reviewed-by: Matt Turner <mattst88@gmail.com>
Updates to non-banked registers, CP_LOAD_STATE, etc, need a WFI if there
is potentially pending rendering. Track this better, and add fd_wfi()
calls everywhere that might potentially need CP_WAIT_FOR_IDLE.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Gallium can leave const buffers bound above what is used by the current
shader. Which can have a couple bad effects:
1) write beyond const space assigned, which can trigger HLSQ lockup
2) double emit of immed consts, first with bound const buffer vals
followed by with actual immed vals. This seems to be a sort of
undefined condition.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Currently lowers the following instructions:
DST, XPD, SCS, LRP, FRC, POW, LIT, EXP, LOG, DP4,
DP3, DPH, DP2
translating these into equivalent simpler TGSI instructions.
This probably should be moved to util so other drivers can use
it, but just adding under freedreno for now so that I can clear
out a lot of the lowering code in a3xx compiler before beginning
to add new compiler.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The ctx should hold ref to dev to avoid problems if screen is destroyed
before ctx. Doesn't really fix the egl/glx issues, but at least it
prevents things from getting much worse.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In the final revision of my gen8_generator patch, I updated the MATH
instruction's assertion from (dst.hstride == 1) to check that source and
destination hstride matched. Unfortunately, I didn't test this enough,
and many Piglit tests fail this test.
The documentation indicates that "scalar source is also supported",
which we believe means <0,1,0> access mode (hstride == 0). If hstride
is non-zero, then it must match the destination register.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This puts the PCI IDs in place so it's easy to enable support. However,
it doesn't actually enable support since it's very preliminary still,
and a few crucial pieces (such as BLORP) are still missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
Eric believes this to be wrong and unnecessary, as the command is
supposed to emit an implicit rectangle primitive. However, empirically
the pixel pipeline is completely unreliable without it. So for now, it
stays until someone comes up with a better solution.
We'll need to do better than this when we implement multisampling, HiZ,
or fast clears...but for now, this will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This is quite similar to the Gen7 code. The main changes:
- 48-bit relocations
- Thread count is specified as U/2-1 instead of U-1.
- An extra DWord (DW9) with clip planes, URB entry output length/offsets
- We need to program the "Expected Vertex Count" (VerticesIn)
v2: Set the number of binding table entries so they can be prefetched
(requested by Eric Anholt).
v3: Add a WARN_ONCE for a missing workaround.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On previous platforms, 3DSTATE_MULTISAMPLE contained the number of
samples, pixel location, and the positions of each sample within a pixel
for each multisampling mode (4x and 8x). It was also a non-pipelined
command, presumably since changing the sample positions is fairly
drastic.
Broadwell improves upon this by splitting the sample positions out into
a separate non-pipelined state packet, 3DSTATE_SAMPLE_PATTERN. With
that removed, 3DSTATE_MULTISAMPLE becomes a pipelined state packet.
Broadwell also supports 2x and 16x multisampling, in addition to the 4x
and 8x supported by Gen7. This patch, however, does not implement 2x
and 16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The amount of cut and paste from Gen7 is rather ugly, and should
probably be cleaned up in the future. Even the Gen7 code is in need of
some tidying though; many of the function parameters aren't used on
platforms that use level/layer rather than tile offsets. Tidying both
can be left to a future patch series. This at least gets things going.
v2: Rebase on Paul's rename of NumLayers -> MaxNumLayers.
v3: Shift QPitch by 2 when storing it in the packet. Bits 14:0 store
bits 16:2 of the actual value. Fixes tests.
v4: Add missing stencil buffer QPitch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
v2: Allow logic ops on all surface types. The UNORM restriction was
lifted with Haswell and I simply hadn't noticed. Also, add missing
BRW_NEW_STATE_BASE_ADDRESS dirty bit. Both caught by Eric Anholt.
v3: Fix swapped per-RT DWord pairs. Eliminates bizarre hacks.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It has additional fields to support clipping to the viewport even if
guardband clipping is enabled.
v2: Update for viewport array changes.
v3: No, seriously, update for viewport array changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
v2: Add missing SCS setting in gen8_emit_buffer_surface_state (caught by
Eric Anholt).
v3: Use stored QPitch rather than recomputing it.
v4: Shift QPitch by 2 when setting it in the packet; bits 14:0 store
bits 16:2 of the actual value (fixes myriads of cube and array
texturing tests). Also, only enable cube face bits for cubemaps
(matches Chris Forbes' commit on master). Port to use offset64.
v5: s/gl_format/mesa_format/g
v6: Fix DW5 of renderbuffer state, which neglected to subtract
irb->mt->first_level. Use vertical_alignment() rather than
hardcoding 4. Use ffs for multisample counts rather than a
large switch statement (all caught/suggested by Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unlike on Gen7, we can directly set the offset via the state packet.
We also -have- to: the kernel SOL reset code won't work anymore.
v2: Fix copy and paste mistake in buffer stride setup; drop stale
comment (caught by Eric Anholt). Add a perf_debug for missing
MOCS setup.
v3: Rebase on Paul Berry's changes to CurrentVertexProgram.
v4: Fix SO Write Offset handling. We need to set bits 20 and 21 so the
hardware both loads and saves the offset. There's also a
restriction that 3DSTATE_SO_BUFFER can only be programmed once per
buffer between primitives, so the "reset to zero" code needed
reworking. Fixes most of the transform feedback Piglit tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Also disable 3DSTATE_WM_CHROMAKEY for safety.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Broadwell's winding order, polygon fill, and viewport Z test fields have
moved to DWord 1 of 3DSTATE_RASTER.
v2: Add a perf_debug for a future optimization and improve commit
message (both suggested by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Emit a dummy 3DSTATE_VF_SGVS packet when not needed.
v3: Add WARN_ONCE and perf_debugs requested by Eric Anholt.
v4: Program 3DSTATE_SGVS even in the no-elements case so gl_VertexID
continues working. Fix 3DSTATE_VF_INSTANCING to not use an
element index to access the buffers array. Some ARB_draw_indirect
prep work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Fix missing "change" bit on instruction state base address
(caught by Haihao Xiang).
v3: Add a perf_debug for missing MOCS setup, requested by Eric.
v4: Fix buffer sizes. The value, specified at bit 12 and up, is
actually measured in 4k pages. We need to round up to the
next multiple of 4k.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v4]
v2: Fix setting of GEN8_PSX_ATTRIBUTE_ENABLE after rebases.
v3: Add missing binding table entry counts. Don't worry about alpha
testing or alpha to coverage when setting the "Kill Pixel" bit;
those are specified in 3DSTATE_PS_BLEND (caught by Eric Anholt).
Drop unused _NEW_BUFFERS. Tidy comments.
v4: Rebase on Paul Berry's changes to CurrentFragmentProgram.
v5: Re-enable line stippling. It doesn't crash or anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v3]
v2: Remove incorrect MOCS shifts; rename urb_entry_write_offset to
urb_entry_output_offset to closer match the documentation.
v3: Only emit a non-zero constant buffer read length when active.
v4: Add missing binding table counts (caught by Eric).
v5: Rebase on Paul Berry's changes to CurrentVertexProgram.
v6: Drop bogus SBE read length/offset field code. We were programming
the wrong values, and our 3DSTATE_SBE code overrides any value we
put here anyway with the correct one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v4]
v2: Only set GEN8_PS_BLEND_HAS_WRITEABLE_RT if color buffer writes are
enabled (caught by Eric Anholt).
v3: Set non-blending flags (writeable RT, alpha test, alpha to coverage)
for integer formats too. +14 Piglits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
v2: Use stencil->_WriteEnabled instead of setting
GEN8_WM_DS_STENCIL_BUFFER_WRITE_ENABLE twice (suggested by Eric).
v3: Mask stencil->WriteMask and stencil->ValueMask with 0xff. The field
is only 8-bits, so we'd trip the new SET_FIELD assertion when core
Mesa gave us a value like 0xFFFFFFFF. The Gen7 code uses structure
field widths to implicitly do this truncation. Fixes Piglit tests.
v4: Use uint32_t for dw1/dw2, not uint8_t. Worst. Typo. Ever.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v2]
The attribute override portion of 3DSTATE_SBE was split out into
3DSTATE_SBE_SWIZ; various bits of 3DSTATE_SF were split out into
3DSTATE_RASTER.
v2: Set Force URB Read Offset bit. Eventually the URB read offset
should be set in 3DSTATE_VS, but that will require some refactoring.
v3: Rebase on viewport array changes.
v4: Improve comments about URB read length/offset overrides.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I haven't investigated whether these are necessary on Broadwell or not,
but for paranoia's sake, we may as well continue doing them for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
It's going to diverge significantly. Starting out with a copy allows
future patches to change atoms one by one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code re-enabling denorms for small float formats did not recognize
this format due to format handling hacks (mainly, the lp_type doesn't have
the floating bit set).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The -p option we now use when calling bison means that this variable will be
named glcpp_parser_debug not yydebug. This was not caught when the -p option
was added because this variable isn't used in the code as committed. (I prefer
the declaration to remain since it allows a developer to easily find this
variable name to enable debugging.)
This is the innocent-looking but killer test case to verify the bug fixed in
the preceding commit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In commit 6005e9cb28 a new start state of NEWLINE_CATCHUP was added to the
lexer. This start state is used whenever the lexer is emitting a NEWLINE token
to emit additional NEWLINE tokens for any newline characters that were skipped
by an immediately preceding multi-line comment.
However, that commit erroneously entered the NEWLINE_CATCHUP state for
single-line comments. This is not desired since in the case of a single-line
comment, the lexer is not emitting any NEWLINE token. The result is that the
lexer will remain in the NEWLINE_CATCHUP state and proceed to fail to emit a
NEWLINE token for the subsequent newline character, (since the case to match \n expects only the INITIAL start state).
The fix is quite simple, remove the "BEGIN NEWLINE_CATCHUP" code from the
single-line comment case, (preserving it only in exactly the cases where the
lexer is actually emitting a NEWLINE token).
Many thanks to Petri Latvala for reporting this bug and for providing the
minimal test case to exercise it. The bug showed up only with a multi-line
comment which was followed immediately by a single-line comment (without any
intervening newline), such as:
/*
*/ // Kablam!
Since 6005e9cb28, and before this commit, that very innocent-looking
combination of comments would yield a parse failure in the compiler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This automatically adjusts the number of buffers that we want based on
what swapping mode the X server is using and the current swap interval:
swap mode interval buffers
copy > 0 1
copy 0 2
flip > 0 2
flip 0 3
Note that flip with swap interval 0 is currently limited to twice the
underlying refresh rate because of how the kernel manages flipping. Moving
from 3 to 4 buffers would help, but that seems ridiculous.
v2: Just update num_back at the point that the values that change num_back
change. This means we'll have the updated value at the point that the
freeing of old going-to-be-unused backbuffers happens, which might not
have been the case before (change by anholt, acked by keithp).
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The __DRIimage createImageFromFds function takes a fourcc code, but there was
no fourcc code that match __DRI_IMAGE_FORMAT_SARGB8. This adds a define for
that format, adds a translation in DRI3 from __DRI_IMAGE_FORMAT_SARGB8 to
__DRI_IMAGE_FOURCC_SARGB8888 and then adds translations *back* to
__IMAGE_FORMAT_SARGB8 in both the i915 and i965 drivers.
I'll refrain from comments on whether I think having two separate sets of
format defines in dri_interface.h is a good idea or not...
Fixes piglit glx-tfp and glx-visuals-depth
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
XCB doesn't flush the output buffer automatically, so we have to call
xcb_flush ourselves before waiting.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we're tracking SBC values correctly, and the X server has the
ability to send the GLX swap events from a PresentPixmap request, enable
this extension.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric figured out that glXWaitForSbcOML wanted to block until the requested
SBC had been completed, which means to wait until the
PresentCompleteNotify event for that SBC had been received.
This replaces the simple sleep(1) loop (which was bogus) with a loop that
just checks to see if we've seen the specified SBC value come back in a
PresentCompleteNotify event yet.
The change is a bit larger than that as I've broken out a piece of common
code to wait for and process a single Present event for the target
drawable.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tracking the full 64-bit SBC values makes it clearer how those values are
being used, and simplifies the wait_msc code. The only trick is in
re-constructing the full 64-bit value from Present's 32-bit serial number
that we use to pass the SBC value from request to event.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This consolidates how we link the libraries into the build directory.
It works for lib_LTLIBRARIES but not custom shared libraries like DRI
drivers or gallium state trackers which needs special casing (cf dri
mega drivers, for example)
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Spamming the pci id is not beneficial. Make sure it's printed
only when needed.
v2: Change severity to _LOADER_DEBUG, rather than removing
the message.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The former symbol is never defined within mesa. Based on the code
it seems that the original intent was to use NDEBUG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Makefiles need hard tabs, let's not make that harder than it needs to be.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
v2: Making GL_ARB_map_buffer_alignment a desktop OpenGL extension only.
v3: Squash two commits together.
v4 (idr): MIN_MAP_BUFFER_ALIGNMENT queries don't have any dependencies.
In previous versions of the patch it depended on EXTRA_API_GL which
would prevent the query from working in core profile contexts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This driver does not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These drivers do not support GL_ARB_map_buffer_range, so no special
treatment is needed for unaligned offsets in the mapping.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ian manually ran the map_buffer_range* tests and the
arb_map_buffer_alignment-* tests, but he did not do a full piglit run.
v2 (idr): Use 64 instead of 4096
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Move parameter loads out of loops, and use the instruction offset
instead of a VGPR for the vertex attribute offset when writing to the
ESGS ring buffer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Just always bind the current states before drawing.
Besides the simplification, as a bonus this makes sure the VS hardware
shader stage always uses the GS copy shader when a geometry shader is
active, fixing a number of GS related piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It needs to increment at shader runtime, not at shader compile time, as
the geometry shader can emit vertices in loops. LLVM automagically
converts the ID back to an immediate value if its value can be
determined at compile time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Transforms, for example,
mul vgrf3, vgrf2, vgrf1
mov.sat vgrf4, vgrf3
into
mul.sat vgrf3, vgrf2, vgrf1
mov vgrf4, vgrf3
which gives register_coalescing an opportunity to remove the MOV
instruction.
total instructions in shared programs: 1515039 -> 1504634 (-0.69%)
instructions in affected programs: 798586 -> 788181 (-1.30%)
GAINED: 0
LOST: 4
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
OpenGL 3.3 spec expects GL_INVALID_OPERATION:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT AND BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID OPERATION."
But OpenGL 4.0 spec changed the error code to GL_INVALID_ENUM:
"For both the default framebuffer and framebuffer objects, the
constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
valid in the bufs array passed to DrawBuffers, and will result
in the error INVALID_ENUM."
This patch changes the behaviour to match OpenGL 4.0 spec
Fixes Khronos OpenGL CTS draw_buffers_api.test.
V2: Update the comment in code.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I sometimes build without EGL just for speed purposes, however
it no longer finds my drivers when I do due to the HAVE_LIBUDEV
defines being wrong.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the tests for the depth parameter for TexImage3D calls when the
target type is GL_TEXTURE_2D_ARRAY or GL_TEXTURE_CUBE_MAP_ARRAY
so that a depth value of 0 is accepted. Previously, the check
incorrectly required the depth argument to be atleast 1.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code depending on the definitions is already wrapped
in the same conditional so go ahead and wrap the include.
Otherwise we'll brake compilation on platforms that are
missing the header. Add assert.h in there as well, as it
is introduced and used in the same fashon.
Cc: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74122
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
We have code generation paths that carry out swizzles of AoS vectors via
bitwise shifts, as these tend to generate more efficient code than
straightforward byte shuffles. But when the input is a constant the
additional bitwise arithmetic operations somehow don't really get
constant propagated properly, evenutally causing assertion failure in
InstCombine pass.
Therefore avoid the bug by using the trivial shuffles for constant
inputs.
Although the sample LLVM IR can cause a crash with any LLVM version,
this was only seen in practice with LLVM 3.2.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The check was in the wrong place, such that if a shader incorrectly put
a preprocessor token before the #version declaration, the version would
be resolved twice, leading to a segmentation fault when attempting to
redefine the __VERSION__ macro.
#extension GL_ARB_sample_shading: require
#version 130
void main() {}
Also, rename glcpp_parser_resolve_version to
glcpp_parser_resolve_implicit_version to avoid confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
radeonsi now reports PIPE_VIDEO_CAP_SUPPORTS_PROGRESSIVE = true if UVD support
isn't available. It's what all the other drivers do.
Also, some #include directives were missing in radeon_uvd.h.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Implemented by the common code. You can now visualize the statistics
with the HUD, see GALLIUM_HUD=help for all available queries. For example:
GALLIUM_HUD=clipper-primitives-generated
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Replace Type A _INT formats names with _SINT to match naming spec,
and update type C formats as follows:
s/MESA_FORMAT_R_INT8\b/MESA_FORMAT_R_SINT8/g
s/MESA_FORMAT_R_INT16\b/MESA_FORMAT_R_SINT16/g
s/MESA_FORMAT_R_INT32\b/MESA_FORMAT_R_SINT32/g
s/MESA_FORMAT_RG_INT8\b/MESA_FORMAT_RG_SINT8/g
s/MESA_FORMAT_RG_INT16\b/MESA_FORMAT_RG_SINT16/g
s/MESA_FORMAT_RG_INT32\b/MESA_FORMAT_RG_SINT32/g
s/MESA_FORMAT_RGB_INT8\b/MESA_FORMAT_RGB_SINT8/g
s/MESA_FORMAT_RGB_INT16\b/MESA_FORMAT_RGB_SINT16/g
s/MESA_FORMAT_RGB_INT32\b/MESA_FORMAT_RGB_SINT32/g
s/MESA_FORMAT_RGBA_INT8\b/MESA_FORMAT_RGBA_SINT8/g
s/MESA_FORMAT_RGBA_INT16\b/MESA_FORMAT_RGBA_SINT16/g
s/MESA_FORMAT_RGBA_INT32\b/MESA_FORMAT_RGBA_SINT32/g
s/\bMESA_FORMAT_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_RED_RGTC1\b/MESA_FORMAT_R_RGTC1_SNORM/g
s/\bMESA_FORMAT_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_RG_RGTC2\b/MESA_FORMAT_RG_RGTC2_SNORM/g
s/\bMESA_FORMAT_L_LATC1\b/MESA_FORMAT_L_LATC1_UNORM/g
s/\bMESA_FORMAT_SIGNED_L_LATC1\b/MESA_FORMAT_L_LATC1_SNORM/g
s/\bMESA_FORMAT_LA_LATC2\b/MESA_FORMAT_LA_LATC2_UNORM/g
s/\bMESA_FORMAT_SIGNED_LA_LATC2\b/MESA_FORMAT_LA_LATC2_SNORM/g
Update comments. Replace format names containing SIGNED with
SNORM appended w/decoration per the format name spec:
s/MESA_FORMAT_SIGNED_R8\b/MESA_FORMAT_R_SNORM8/g
s/MESA_FORMAT_SIGNED_RG88_REV\b/MESA_FORMAT_R8G8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBX8888\b/MESA_FORMAT_X8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888\b/MESA_FORMAT_A8B8G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_SNORM/g
s/MESA_FORMAT_SIGNED_R16\b/MESA_FORMAT_R_SNORM16/g
s/MESA_FORMAT_SIGNED_GR1616\b/MESA_FORMAT_R16G16_SNORM/g
s/MESA_FORMAT_SIGNED_RGB_16\b/MESA_FORMAT_RGB_SNORM16/g
s/MESA_FORMAT_SIGNED_RGBA_16\b/MESA_FORMAT_RGBA_SNORM16/g
s/MESA_FORMAT_SIGNED_A8\b/MESA_FORMAT_A_SNORM8/g
s/MESA_FORMAT_SIGNED_I8\b/MESA_FORMAT_I_SNORM8/g
s/MESA_FORMAT_SIGNED_L8\b/MESA_FORMAT_L_SNORM8/g
s/MESA_FORMAT_SIGNED_A16\b/MESA_FORMAT_A_SNORM16/g
s/MESA_FORMAT_SIGNED_I16\b/MESA_FORMAT_I_SNORM16/g
s/MESA_FORMAT_SIGNED_L16\b/MESA_FORMAT_L_SNORM16/g
s/MESA_FORMAT_SIGNED_AL88\b/MESA_FORMAT_L8A8_SNORM/g
s/MESA_FORMAT_SIGNED_RG88\b/MESA_FORMAT_G8R8_SNORM/g
s/MESA_FORMAT_SIGNED_RG1616\b/MESA_FORMAT_G16R16_SNORM/g
Change all 4 color component unsigned byte formats to meet spec for P
Type formats:
s/MESA_FORMAT_RGBA8888\b/MESA_FORMAT_A8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBA8888_REV\b/MESA_FORMAT_R8G8B8A8_UNORM/g
s/MESA_FORMAT_ARGB8888\b/MESA_FORMAT_B8G8R8A8_UNORM/g
s/MESA_FORMAT_ARGB8888_REV\b/MESA_FORMAT_A8R8G8B8_UNORM/g
s/MESA_FORMAT_RGBX8888\b/MESA_FORMAT_X8B8G8R8_UNORM/g
s/MESA_FORMAT_RGBX8888_REV\b/MESA_FORMAT_R8G8B8X8_UNORM/g
s/MESA_FORMAT_XRGB8888\b/MESA_FORMAT_B8G8R8X8_UNORM/g
s/MESA_FORMAT_XRGB8888_REV\b/MESA_FORMAT_X8R8G8B8_UNORM/g
v2: Note that Fredrik Höglund is working on GL_ARB_multi_bind, not
Maxence Le Doré. Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The define was only available if
gl_extensions::AMD_shader_trinary_minmax was set, but no driver set it.
Since the extension is advertised by default, remove that field too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Maxence Le Doré <maxence.ledore@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a Radeon 7500 (OpenGL renderer string: Mesa DRI R100 (RV200
5157) TCL DRI2). I couldn't do a full piglit run because it would tank
the system with or without this patch. I just ran all the blit tests
(-t blit to piglit-run.py). Only fbo-sys-sub-blit failed. All of the
other tests that weren't skipped (i.e., all the multisample and sRGB
tests skip) passed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dd_function_table::BlitFramebuffer is already initialized to
_mesa_meta_BlitFramebuffer, so it should just work.
Tested on a FireGL 8800 (OpenGL renderer string: Mesa DRI R200 (R200
5148) TCL DRI).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the glTexStorage3D failure in
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need almost identical code in the glTexStorage path.
v2: Fix typo in a comment noticed by Topi.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All versions of the OpenGL spec are quite clear that
GL_INVALID_OPERATION should be generated. I added a quotation from the
3.3 core profile spec.
Fixes the glTexImage3D subcases of
ext_packed_depth_stencil-depth-stencil-texture and
oes_packed_depth_stencil-depth-stencil-texture_gles2. The same subtests
of oes_packed_depth_stencil-depth-stencil-texture_gles1 fail, but they
fail with a different wrong error code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cover both loader and glx/dri_glx
Drop \n from the default loader logger
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the loader changes, there has been a compiler warning that the
prototype didn't match. It turns out that if a loader error message was
ever thrown, you'd segfault because of trying to use the warning level as
a format string.
Reviewed-by: Keith Packard <keithp@keithp.com>
Tested-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This allows Mesa to choose to rename driver .sos (or split drivers),
without needing a flag day with the corresponding 2D driver.
v2: Undo the loader-only-for-dri3 change.
Reviewed-by: Keith Packard <keithp@keithp.com> [v1]
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
I want to stop trusting the server for the driver name, and instead decide
on our own based on the fd, so I needed this code motion.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Steam links against libudev.so.0, while we're linking against
libudev.so.1. The result is that the symbol names (which are the same in
the two libraries) end up conflicting, and some of the usage of .so.1
calls the .so.0 bits, which have different internal structures, and
segfaults happen.
By using a dlopen() with RTLD_LOCAL, we can explicitly look for the
symbols we want, while they get the symbols they want.
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Some of the hardware support is missing. The NVIDIA-provided driver,
which claims seamless cube map support fails the relevant tests as well.
As this is the last extension before we can have OpenGL 3.2, doing this
allows us to expose geometry shaders without doing the additional
work involved in supporting ARB_geometry_shader4.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Creates two areas in the AUX constbuf:
- Sample offsets for MS textures
- Per-texture MS settings
When executing a texelFetch with a MS sampler, looks up that texture's
settings and adjusts the parameters given to the texfetch instruction.
With this change, all the ARB_texture_multisample piglits pass, so turn
on PIPE_CAP_TEXTURE_MULTISAMPLE.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each code BO is a heap that allocates at the end first, and so GPs are
allocated at the very end of the allocated space. When executing, we see
PAGE_NOT_PRESENT errors for the next page. Just over-allocate to make
sure that there's something there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Make sure that we never try to use a 0-sized map. This can happen when
using a gp, so add a dummy mapping when computing vp_gp_mapping in that
case.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marks gl_Layer as only having one component, and makes sure to keep
track of where it is and emit it in the output map, since it is not an
input to the FP.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Note that the primitive id is stored in a[0x18], while usually the
geometry instructions are of the form a[$a1 + 0x4] which gets mapped to
p[] space. We need to avoid the change from a[] to p[] here, so it's
keyed on whether the access is indirect or not.
Note that there's also a use-case for accessing e.g. a[$r1], however
that's not supported for now. (Could be added by checking the register
file of the indirect parameter.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Layer output probably doesn't work yet, but other than that everything seems
to be working.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: fix up minor bugs, code formatting]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of emitting an SHL 4 io an address register on the TGSI ARL and UARL
instructions, emit the shift when the loaded address is actually used. This
is necessary because input vertex and attribute indices in geometry shaders on
nv50 need to be shifted left by 2 instead of 4.
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
[calim: various updates to the indirect address logic]
Signed-off-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
[imirkin: remove OP_MAD change that calim made, add OP_RESTART handling
same as OP_EMIT for code flow analysis]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is needed since commit 9baa45f78b (st/mesa: bind NULL colorbuffers
as specified by glDrawBuffers).
This implementation is highly based on a larger commit by
Christoph Bumiller <e0425955@student.tuwien.ac.at> in his gallium-nine
branch.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
It doesn't make sense to do an OP_NEG from U32 to U32. This was
manifested on nv50 in glsl-fs-atan-3 which was generating a
UMAD TEMP[0].x, TEMP[0].xxxx, -TEMP[5].xxxx, TEMP[0].xxxx
instruction. (For some reason, nvc0 causes a different shader to be
generated.) This led to a
cvt neg u32 $r1 u32 $r1
Which did not yield the desired result. This changes the final output to
cvt neg s32 $r1 u32 $r1
which produces the desired output and the piglit tests passes. My
assumption is that this is also what we want on nvc0, but could not test
as there was no suitable shader that generated the problem instruction.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When the min_index is very large (or very negative), the multipliation
can overflow 32 bits and result in an incorrect map pointer
modification.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was discovered as a result of the draw-elements-base-vertex-neg
piglit test, which passes very negative offsets in, followed up by large
indices. The nouveau code correctly adjusts the pointer, but the
translate code needs to do the proper inverse correction. Similarly fix
up the SSE code to do a 64-bit multiply to compute the proper offset.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Broadwell requires software to specify QPitch in a bunch of packets,
so we decided to store it in the miptree. However, when I did that
refactoring, I missed a subtlety: the hardware expects QPitch to be
"in units of rows in the uncompressed surface".
This is the value we originally compute. However, for compressed
surfaces, we then divided it by 4 (the block height), to obtain the
physical layout. This is no longer the QPitch Broadwell expects.
So, store the original undivided value in mt->qpitch, but continue to
use the divided value in brw_miptree_layout_texture_array(). For
non-Broadwell platforms, this should have no impact at all.
Helps fix Piglit's "getteximage-targets S3TC CUBE" test on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fixes the NetBSD build.
NetBSD does not have pthread_mutex_timedlock.
CC glapi_dispatch.lo
threads_posix.h: In function 'mtx_timedlock':
threads_posix.h:216:5: error: implicit declaration of function 'pthread_mutex_timedlock'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
The type of all three parameters are identical, so we don't need to
specify it three times. The predicate is always identical too, so we
don't need to make it a parameter, either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Simple shaders such as:
void splat(vec2 v, float f) {
v[0] = v[1] = f;
}
failed to compile with the following error:
error: value of type vec2 cannot be assigned to variable of type float
First, we would process v[1] = f, and transform:
LHS: (expression float vector_extract (var_ref v) (constant int (1)))
RHS: (var_ref f)
into:
LHS: (var_ref v)
RHS: (expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f))
Note that the LHS type is now vec2, not a float. This is surprising,
but not the real problem.
After emitting assignments, this ultimately becomes:
(declare (temporary) vec2 assignment_tmp)
(assign (xy)
(var_ref assignment_tmp)
(expression vec2 vector_insert (var_ref v) (constant int (1))
(var_ref f)))
(assign (xy) (var_ref v) (var_ref assignment_tmp))
We would then return (var_ref assignment_tmp) as the rvalue, which has
the wrong type---it should be float, but is instead a vec2.
To fix this, we simply return (vector_extract (var_ref assignment_temp)
<the appropriate channel>) to pull out the desired float value.
Fixes Piglit's chained-assignment-with-vector-constant-index.vert and
chained-assignment-with-vector-dynamic-index.vert tests.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74026
Reported-by: Dan Ginsburg <dang@valvesoftware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Transform feedback may come from either the geometry shader or the
vertex shader, so we can't use
ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to find the current
post-link transform feedback information. Fortunately we can use
ctx->TransformFeedback.CurrentObject->shader_program.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, the _mesa_{Begin,Resume}TransformFeedback
functions were using ctx->Shader.CurrentProgram[MESA_SHADER_VERTEX] to
find the program that would be the source of transform feedback data.
This isn't correct--if there's a geometry shader present it should be
ctx->Shader.CurrentProgram[MESA_SHADER_GEOMETRY]. (These might be
different if separate shader objects are in use).
This patch creates a function get_xfb_source(), which figures out the
correct program to use based on GL state, and updates
_mesa_{Begin,Resume}TransformFeedback to call it. get_xfb_source() is
written in terms of the gl_shader_stage enum, so it should not need
modification when we add tessellation shaders in the future. It also
creates a new driver flag, NewTransformFeedbackProg, which is flagged
whenever this program changes.
To reduce future confusion, this patch also rewords some comments and
error message text to avoid referring to vertex shaders.
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
v2: make the for loop in get_xfb_source() clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "shader" field in fs_generator, vec4_generator, and gen8_generator
was only used for one purpose; to figure out if we were compiling an
assembly program or a GLSL shader (shader is NULL for assembly
programs). And it wasn't being used properly: in vec4 shaders we were
always initializing it based on
prog->_LinkedShaders[MESA_SHADER_FRAGMENT], regardless of whether we
were compiling a geometry shader or a vertex shader.
This patch simplifies things by using the "prog" field instead; this
is also NULL for assembly programs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of defining preprocessor macros in glcpp_parser_create based on
the GL API, wait until the shader version has been resolved. Doing this
allows us to correctly set (and not set) preprocessor macros for
extensions allowed by the API but not the shader, as in the case of
ARB_ES3_compatibility.
The shader version has been resolved when the preprocessor encounters
the first preprocessor token, since the GLSL spec says
"The #version directive must occur in a shader before anything else,
except for comments and white space."
Specifically, if a #version token is found the version is known
explicitly, and if any other preprocessor token is found then the GLSL
version is implicitly 1.10.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71630
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes glean fragProg1 regression caused by commit b9f68d927e
(implement TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS). This bug
only appears when the fragment shader emits fragment.Z before
color outputs. The bug was caused by confusion between register
indexes and semantic indexes.
Also added some comments to better explain register indexing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise we pull libudev as a dependency and crash
games/programs that ship their own version of libudev.
Either way we should link the loader lib only when needed.
This fixes a regression caused by
commit eac776cf77
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Sat Jan 11 02:24:43 2014 +0000
glx: use the loader util lib
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73854
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Leaving it set to zero isn't really correct since every allocation has
at least an alignment of 1 byte. It also caused a problem in the i965
driver after I removed the MAX(64, ...) from the alignment calculation.
That's what I get for changing a patch without retesting it. :(
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73907
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lu Hua <huax.lu@intel.com>
Sed job:
grep -lr BEGIN_BATCH_NO_AUTOSTATE src/mesa/drivers/dri/ | while read f
do
cat $f | sed 's/BEGIN_BATCH_NO_AUTOSTATE/BEGIN_BATCH/g' > x
mv x $f
done
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
This parameter hasn't been used since January 2010 (commit 29e02c7).
Fixes the following warning in both radeon and r200:
radeon_common.c: In function 'r200_rcommonBeginBatch':
radeon_common.c:762:14: warning: unused parameter 'dostate' [-Wunused-parameter]
Note that now BEGIN_BATCH and BEGIN_PATCH_NO_AUTOSTATE are identical.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
radeon_common.c: In function 'radeon_draw_buffer':
radeon_common.c:237:3: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
When parameters were removed from dd_function_table::Viewport (commit
065bd6ff), radeon_viewport (in both radeon and r200) started generating
a warning.
radeon_common.c: In function 'r200_radeon_viewport':
radeon_common.c:415:15: warning: assignment from incompatible pointer type [enabled by default]
radeon_common.c:419:23: warning: assignment from incompatible pointer type [enabled by default]
I didn't notice this initially, and it's harmless because the function is
never called through the incorrectly typed pointer.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Otherwise they will be NULL when stage destroy is invoked prematurely,
(i.e, on out of memory).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Use some new helper functions to make the code much more readable.
And fix wrong value for XPD's w result.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In some cases we were converting generic formats to sized formats
and vice versa. The point is to simply convert sRGB formats to
corresponding linear formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Became apparent with the C11 thread changes. Unfortunately I didn't
have all dependencies to build the driver, and only noticed
this issue on build server.
Implementation is based of https://gist.github.com/2223710 with the
following modifications:
- inline implementatation
- retain XP compatability
- add temporary hack for static mutex initializers (as they are not part
of the stack but still widely used internally)
- make TIME_UTC a conditional macro (some system headers already define
it, so this prevents conflict)
- respect HAVE_PTHREAD macro
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Previously the reason we needed is_array was because we used array_size == NULL to
represent both non-arrays and unsized arrays. Now that we use a non-NULL
array_specifier to represent an unsized array, is_array is redundant.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need to insert outermost dimensions in the correct spot otherwise
the dimension order will be backwards
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
array
This change does not help fix or prevent any bugs
it just seems reasonable to do
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
No regressions on IVB (piglit quick + unit tests).
v2 (Paul):
- no need to patch the unit tests anymore. Original logic
was altered and unit tests updated to match the
fs-generator
- lrp emission moves from the blorp compiler core into the
emitter here (previously there was a separate refactoring
patch which is not really needed anymore as the lrp logic
got refactored when the original lrp logic got fixed).
- pass 'BRW_BLORP_RENDERBUFFER_BINDING_TABLE_INDEX' to the
generator in fs_inst::target instead of hardcoding it
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The compiler for blorp programs likes to emit instructions for
the message construction itself meaning that the generator needs
to skip any such when blorp programs are translated for the hw.
In addition, the binding table control is special for blorp
programs and the generator does not need to update the binding
tables associated with the compiler bookkeeping (this in fact
gets thrown away as the blorp compiler sets the program data
in its own way).
v2 (Paul): do not hardcode the binding table index but use
fs_inst::target instead.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Unit tests comparing generated blorp programs to known good need
to have the dump in designated file instead of in default
standard output. The comparison also expects the jump counters
of if-else-instructions to be correctly set and hence the dump
needs to be taken _after_ 'patch_IF_ELSE()' is run (the default
dump of the fs_generator does this before).
v2 (Paul): dropped the redundant 'dump_enabled' argument
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the special case requiring explicit execution size
control is wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
In addition, the two special cases requiring explicit execution
size control are wrapped manually.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2 (Paul): pass the combining opcode as an argument to emit_combine().
This keeps manual_blend_average() selfcontained
documentation wise.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Resolving of the hardware message type is moved into the
emitter also in preparation for switching to use fs_generator.
The generator wants to translate the high level op-code into
the message type and hence the emitter needs to know the
original op-code.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for the introduction of non-compressed multi-sampled
lookup used in the blorp programs.
v2: now also taking into account gen8
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The combination of four separate comparison operations and
and the masked "and" require special treatment when moving
to FS LIR.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Prepares for presenting blorp blit programs using FS IR that
allows EU-assembly generation using i965 glsl-compiler
backend (fs_generator).
v2: rebased on top of endif-jump counter fix (moving the
added brw_set_uip_jip() into the emitter)
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The Intel closed source OpenGL driver recently began supporting 32
texture image units on Haswell. This makes the open source driver
support 32 as well.
Earlier generations don't have the message header field required to
support more than 16 sampler states, so we continue to advertise 16
there.
On Haswell, this causes us to advertise:
- GL_MAX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS = 32
- GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
instead of the old values of 16, 16, and 48.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
BRW_MAX_TEX_UNIT is about to grow, but only Gen7+ will be able to
support the new larger value. On older platforms, we don't want to
allocate the extra space - it would just be a waste.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Like the scalar backend, we add an offset to the "Sampler State Pointer"
field to select a group of 16 samplers, then use the "Sampler Index"
field to select within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The next patch adds an additional case where the message header is
necessary. So we want to do the g0 copy if inst->header_present is set,
rather than inst->texture_offset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
In theory, a shader might use textureOffset() but set all the texel
offsets to zero. In that case, we don't actually need to set up the
message header - zero is the implicit default.
By moving the texture_offset setup before the header_present setup, we
can easily only set header_present when there are non-zero texel offset
values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The message descriptor's "Sampler Index" field is only 4 bits (on all
generations of hardware), so it can only represent indices 0 through 15.
Haswell introduced a new field in the message header - "Sampler State
Pointer". Normally, this is copied straight from g0, but we can also
add a byte offset (as long as it's a multiple of 32).
This patch uses a "Sampler State Pointer" offset to select a group of
16 sampler states, and then uses the "Sampler Index" field to select
the state within that group.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the code to copy g0 to the message header existed in two
places - one for the texture offset case, and one for any other case.
By treating texture_offset as a special case of header_present, we can
remove this duplication and shorten the code. Future patches which add
new header fields also won't have to add additional duplication.
This also clarifies a confusing construct. The old code contained:
} else if (inst->header_present) {
if (brw->gen >= 7) {
...explicit copy from g0 to the message header...
} else {
/* Set up an implied move from g0 to the MRF. */
}
}
This looks like it might set up an implied move on Sandybridge, which
doesn't support those. However, Sandybridge only uses a message header
for texture offsets, so it would never hit this code path. The new code
avoids this implicit knowledge by only setting up an implied move on
Gen4-5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Use the WINDOW and VPORT scissors for the framebuffer and scissor test,
respectively. The other two scissors are disabled (they cover the max fb size).
We actually have 16 VPORT scissors, which will map well to ARB_viewport_array.
Also, we don't need to write SC_WINDOW_OFFSET with this commit, because it's
disabled everywhere.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Also set the unsynchronized flag if the whole resource was discarded
to avoid doing buffer-busy checks again.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
It's unused and shouldn't be used at all in my opinion.
If some driver doesn't support the unsynchronized flag, u_upload_mgr should
avoid the synchronization by other means, e.g. by using the DONTBLOCK flag.
This is the recommended way for streaming vertices. Always use this if you
need to upload vertices every frame.
Reviewed-by: Christian König <christian.koenig@amd.com>
Commit 05da4a7a5e attempts to eliminate the
call to intel_update_renderbuffer() in the case where we already have a
drawbuffer for the drawable. Unfortunately this only checks the
back left renderbuffer, which breaks in case of single buffer drawables.
This means that the initial viewport will not be set in that case. Instead,
we now check whether the initial viewport has not been set, in which case
we call out to intel_update_renderbuffer().
https://bugs.freedesktop.org/show_bug.cgi?id=73862
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Most of the time it is not necessary to perform type inference to
compile GLSL; the type of every expression can be inferred from the
contents of the expression itself (and previous type declarations).
The exception is aggregate initializers: their type is determined by
the LHS of the variable being assigned to. For example, in the
statement:
mat2 foo = { { 1, 2 }, { 3, 4 } };
the type of { 1, 2 } is only known to be vec2 (as opposed to, say,
ivec2, uvec2, int[2], or a struct) because of the fact that the result
is being assigned to a mat2.
Previous to this patch, we handled this situation by doing some type
inference during parsing: when parsing a declaration like the one
above, we would call _mesa_set_aggregate_type(), which would infer the
type of each aggregate initializer and store it in the corresponding
ast_aggregate_initializer::constructor_type field. Since this
happened at parse time, we couldn't do the type inference using
glsl_type objects; we had to use ast_type_specifiers, which are much
more awkward to work with. Things are about to get more complicated
when we add support for ARB_arrays_of_arrays.
This patch simplifies things by postponing the call to
_mesa_set_aggregate_type() until ast-to-hir time, when we have access
to glsl_type objects. As a side benefit, we only need to have one
call to _mesa_set_aggregate_type() now, instead of six.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Specs say "If the argument is a buffer object, the arg_value
pointer can be NULL or point to a NULL value in which case a NULL
value will be used as the value for the argument declared as a
pointer to __global or __constant memory in the kernel."
So don't crash when somebody does that.
v2: Insert NULL into input buffer instead of buffer handle pair
Fix constant_argument too
Drop r600 driver changes
v3: Fix inserting NULL pointer
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
No known bugs fixed but this is now in line with fs-generator.
No regresssions on IVB.
Eric further explained that:
"The endif jump, since it's forward, is just an optimization to
have set right -- otherwise, the GPU will just step forward
instruction by instruction until it hits something else that
updates the per-channel PC."
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is possible now that ctx->Shader.CurrentProgram is an array.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that we have a ctx->Shader.CurrentProgram array, we can just use
it directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since ctx->Shader.Current{Vertex,Geometry,Fragment}Program is an
array, this allows some meta code to be rolled up into loops.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are replaced with
ctx->Shader.CurrentProgram[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}].
In patches to follow, this will allow us to replace a lot of ad-hoc
logic with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' ')' \
-print0 | xargs -0 sed -i \
-e 's/\.CurrentVertexProgram/.CurrentProgram[MESA_SHADER_VERTEX]/g' \
-e 's/\.CurrentGeometryProgram/.CurrentProgram[MESA_SHADER_GEOMETRY]/g' \
-e 's/\.CurrentFragmentProgram/.CurrentProgram[MESA_SHADER_FRAGMENT]/g'
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Rather than maintain separately named arrays and counts for vertex,
geometry, and fragment shaders, just maintain these as arrays indexed
by the gl_shader_type enum.
v2: When there is neither a vertex nor a geometry shader, set
prog->LastClipDistanceArraySize = 0, and clarify that the values is
not used.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces code in _mesa_new_shader() and delete_shader_cb()
that checks the type of a shader with calls to
_mesa_validate_shader_target(). This has two advantages: it allows
for a more thorough check (since _mesa_validate_shader_target()
doesn't permit shader targets that aren't supported by the back-end),
and it reduces the amount of code that will need to be modified when
adding new shader stages.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow this function to be used in circumstances where there
is no context available, such as when building built-in GLSL
functions.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
In my recent zeal to refactor Mesa's handling of the gl_shader_stage
enum, I accidentally wound up with two functions that do the same
thing: _mesa_program_index_to_target(), and
_mesa_shader_stage_to_program().
This patch keeps _mesa_shader_stage_to_program(), since its name is
more consistent with other related functions. However, it changes the
signature so that it accepts an unsigned integer instead of a
gl_shader_stage--this avoids awkward casts when the function is called
from C++ code.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make the functions available to work around a GLEW bug (see
comments already in the code), but if an application calls one of these
functions we should still generate GL_INVALID_OPERATION.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch handles the use of 'centroid' qualifier with 'in' variables
in a fragment shader when persample shading is enabled. Per sample
shading for the whole fragment shader can be enabled by:
glEnable(GL_SAMPLE_SHADING) or using {gl_SamplePosition, gl_SampleID}
builtin variables in fragment shader. Explaining it below in more
detail.
/* Enable sample shading using OpenGL API */
glEnable(GL_SAMPLE_SHADING);
glMinSampleShading(1.0);
Example fragment shader:
in vec4 a;
centroid in vec4 b;
main()
{
...
}
Variable 'a' will be interpolated at sample location. But, what
interpolation should we use for variable 'b' ?
ARB_sample_shading recommends interpolation at sample position for
all the variables. GLSL 400 (and earlier) spec says that:
"When an interpolation qualifier is used, it overrides settings
established through the OpenGL API."
But, this text got deleted in later versions of GLSL.
NVIDIA's and AMD's proprietary linux drivers (at OpenGL 4.3)
interpolates at sample position. This convinces me to use
the similar approach on intel hardware.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Current implementation of arb_sample_shading doesn't set 'Barycentric
Interpolation Mode' correctly. We use pixel barycentric coordinates
for per sample shading. Instead we should select perspective sample
or non-perspective sample barycentric coordinates.
It also enables using sample barycentric coordinates in case of a
fragment shader variable declared with 'sample' qualifier.
e.g. sample in vec4 pos;
A piglit test to verify the implementation has been posted on piglit
mailing list for review.
V2: Do not interpolate all the 'in' variables at sample position
if fragment shader uses 'sample' qualifier with one of them.
For example we have a fragment shader:
#version 330
#extension ARB_gpu_shader5: require
sample in vec4 a;
in vec4 b;
main()
{
...
}
Only 'a' should be sampled at sample location, not 'b'.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be useful in my next patch which depends on a functionality
of _mesa_get_min_invocations_per_fragment() to ignore the sample
qualifier (prog->IsSample) based on a flag passed to it.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by
4.61%, and CS:GO by 5.71%.
total instructions in shared programs: 1500153 -> 1498191 (-0.13%)
instructions in affected programs: 59919 -> 57957 (-3.27%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Only implemented for ir_swizzles currently, but perhaps will be useful
for other IR types in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This flag was really just a proxy for determining whether the backend
was vector (AOS) or scalar (SOA). It will be used to apply a future
optimization only for vector backends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Dumping the number of live registers at each IP allows us to see
register pressure and identify any local maxima. This should
aid in debugging passes designed to reduce register pressure, as
well as optimizations that suddenly trigger spilling.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Calling it after value numbering (added in the next commit) prevents
some instruction count regressions.
total instructions in shared programs: 1524387 -> 1523905 (-0.03%)
instructions in affected programs: 13112 -> 12630 (-3.68%)
GAINED: 0
LOST: 3
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously we simply considered two registers whose live ranges
overlapped to interfere. Cases such as
set A ------
... |
mov B, A -- |
... | B | A
use B -- |
... |
use A ------
would be considered to interfere, even though B is an unmodified copy of
A whose live range fit wholly inside that of A.
If no writes to A or B occur between the mov B, A and the use of B then
we can safely coalesce them.
Instead of removing MOV instructions, we make them NOPs and remove them
at once after the main pass is finished in order to avoid recomputing
live intervals (which are needed to perform the previous step).
total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
instructions in affected programs: 951563 -> 920872 (-3.23%)
GAINED: 46
LOST: 22
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
mov takes only a single source argument. Example instruction
inexplicably changed from add to mov in commit f10f5e49.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Unnamed record types are assigned to separate types per stage, e.g. if
uniform struct { ... } a;
is defined in both vertex and fragment shader, two separate types will
result with different names. When linking the shader, this results in a
type conflict. However, there is no reason why this should not be
allowed according to GLSL specifications. Compare and match record types
when linking shader stages to avoid this conflict.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes several colorbuffer tests, including piglit "fbo-drawbuffers-none"
for "gl_FragColor" and "glDrawPixels" cases.
v2: rework patch to only avoid creating extra shader variants when
TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS is not specified. Per Jose.
Use a write_color0_to_n_cbufs key field to replicate color0 to N
color buffers only when N > 0 and WRITES_ALL_CBUFS is set.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The new TYPE_DOUBLEN_2 type was added in 0e60d850 but the code to
return values of that type wasn't completed.
Fixes conform's default state test. glGetFloatv(GL_DEPTH_RANGE)
wasn't returning anything.
v2: remove stray 'break' statements.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These messages are in code that is shared between the VS and GS
back-ends, so use the terminology "vec4" to avoid confusion.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even with depth clipping disabled, vertices which have negative w coords
must be discarded. And since we don't have a proper guardband implementation
yet (relying on driver to handle all values except infs/nans in rasterization
for such points) we need to kill them off manually (as they can end up with
coordinates inside viewport otherwise).
v2: use 0.0f instead of 0 (spotted by Brian).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Also increment ir->offset in the GS visitor, rather than at the
final assembly generation stage (requested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the
48-bit addressing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Now that we have a helper function that handles the PIPE_CONTROL
variations between the various platforms, these are basically the same.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There are a lot of places that use PIPE_CONTROL to write a value to a
buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT).
Creating a single function to do this seems convenient.
As part of this refactor, we now set the PPGTT/GTT selection bit
correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms.
This is correct for Sandybridge, but actually part of the address on
Ivybridge and later!
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to adjust that in substantially fewer
places, giving us confidence that we've hit them all.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I believe that PIPE_CONTROL uses the length field to decide whether to
do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write,
while a length of 5 would do a 64-bit write. (I haven't verified this,
though.)
For workaround writes, we don't care what value gets written, or how
much data. We're only writing something because hardware bugs mandate
that do so. So using a 64-bit write should be fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The PIPE_CONTROL packet actually has 5 DWords on Gen6+:
1. Header
2. Flags
3. Address
4. Immediate Data: Lower DWord
5. Immediate Data: Upper DWord
We just never emitted the last one. While it appears to work, it's
probably safer to emit the entire thing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These days, we need to emit PIPE_CONTROL flushes all over the place.
Being able to do that via a single function call seems convenient.
Broadwell will also increase the length of these packets by 1; with the
refactoring, we should have to do this in substantially fewer places.
v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by
Eric Anholt). Drop unlikely() from BLT_RING check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell uses 48-bit addresses. The first DWord is the low 32 bits,
and the second DWord is the high 16 bits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to
replace the old 'unsigned long offset' field. To preserve ABI, libdrm
continues to store the presumed offset in both locations.
On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses.
However, with a 32-bit userspace, the 'unsigned long offset' field will
only be 32-bit, which is not large enough to hold this value. We need
to use a proper uint64_t (like the kernel does).
Technically, a lot of this code doesn't affect Broadwell, so we could
leave it using the old field. But it makes sense to just switch to the
new, properly typed field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intel_buffer_objects.c: In function 'old_intel_bufferobj_buffer':
intel_buffer_objects.c:471:17: warning: unused parameter 'flag' [-Wunused-parameter]
The parameter hasn't been used since the i915 and i965 drivers had their
breakup. i965 got the flags, and i915 got to cry itself to sleep.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not actually tested, but the changes are identical to the i965 changes
that are tested.
v2: Remove MAX2(64, ...). Suggested by Ken (in the i965 version of this
patch).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
No piglit regressions on IVB.
With minor tweaks to the arb_map_buffer_alignment-map-invalidate-range
test (disable the extension check, set alignment to 64 instead of
querying), the i965 driver would fail the test without this patch (as
predicted by Eric). With this patch, it passes.
v2: Remove MAX2(64, ...). Suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Siavash Eliasi <siavashserver@gmail.com>
v2 (idr): Only enable the extension on GEN7+ w/core profile because it
requires geometry shaders.
v3 (idr): Add some casting to fix setting of ViewportBounds.Min.
Negating an unsigned value, then casting to float doesn't do what you
might think it does.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
noop_scissor (correctly) only examines the scissor rectangle for
viewport 0. Therefore, it should only be called when that scissor
rectangle is enabled.
v2: Remove spurious change to radeon code. Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently MaxViewports is still 1, so this won't affect any change.
v2: Minor code reformatting suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers that currently use _Xmin and friends to set their scissor
rectangle will need to use this code directly once they are updated for
GL_ARB_viewport_array.
v2: Use different bit-test idiom and fix mixed tabs and spaces. Both
were suggested by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At various stages the hardware clamps the gl_ViewportIndex to these
values. Setting them to zero effectively makes gl_ViewportIndex be
ignored. This is acutally useful in blorp (so that we don't have to
modify all of the viewport / scissor state).
v2: Use INTEL_MASK to create GEN6_CLIP_MAX_VP_INDEX_MASK. Suggested by
Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Define API connections to extension entry points added in previous
commits. Update entry points to use floating point arguments as
required by the extension.
Add get tokens for ARB_viewport_array state.
v2: Include review feedback.
v3 (idr): Fix 'make check'. Add missing Get infrastructure (some was
culled from other pathces).
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_viewport_no_notify / set_depth_range_no_notify (and
manually notify the driver) instead of calling _mesa_set_viewporti /
_mesa_set_depthrangei. Refactor bodies of _mesa_ViewportIndexed and
_mesa_ViewportIndexedv into a shared function. Remove spurious CLAMP
calls in _mesa_DepthRangeArrayv and _mesa_DepthRangeIndexed.
v3 (idr): Add some missing return-statements after calls to _mesa_error.
v4 (idr): Only perform the ViewportBounds.Min / ViewportBounds.Max
clamping in set_viewport_no_notify if GL_ARB_viewport_array is enabled.
Otherwise the driver may not have set ViewportBounds, and the clamping
will do bad things.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Use set_scissor_no_notify (and manually notify the driver)
instead of calling _mesa_set_scissori. Refactory bodies of
_mesa_ScissorIndexed and _mesa_ScissorIndexedv into a shared function.
Perform parameter validation in the same order in all three functions.
Pull MaxViewports comparison fix (in _mesa_ScissorArrayv) from the next
patch to this patch.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that the scissor enable state is a bitfield need a custom function
to extract the correct value from gl_context. Modeled
Scissor.EnableFlags after Color.BlendEnabled.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Fix several "comparison between signed and unsigned integer
expressions" warnings.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This matches the expectations of GL_ARB_viewport_array and the storage
type where the values will land.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the restore code would enable all scissor rectangles if any
scissor rectangles were enabled on entry to meta. When there is only
one scissor rectangle, this is fine. As soon as a driver supports
multiple viewports, this will be a problem.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Scissor, make sure that ctx->Driver.Scissor is only called once
instead of once per scissor rectangle.
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_Viewport and _mesa_DepthRange, make sure that
ctx->Driver.Viewport is only called once instead of once per viewport or
depth range.
v2: Make _mesa_DepthRange actually set all of the depth ranges (instead
of just index 0). Noticed by Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Use MAX_VIEWPORTS instead of ctx->Const.MaxViewports because the
driver may not set ctx->Const.MaxViewports yet.
v3: Handle all viewport entries in update_viewport_matrix and
_mesa_copy_context too. This was previously in an earlier patch.
Having the code in the earlier patch could cause _mesa_copy_context to
access a matrix that hadn't been constructed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v2]
Create an internal function that just writes data into the scissor
rectangle. In future patches this will see more use because we only
want to call dd_function_table::Scissor once after setting all of the
scissor rectangles instead of once per scissor rectangle.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the viewport. In
future patches this will see more use because we only want to call
dd_function_table::Viewport once after setting all of the viewport
instead of once per viewport.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create an internal function that just writes data into the depth range.
In future patches this will see more use because we only want to call
dd_function_table::DepthRange once after setting all of the depth ranges
instead of once per depth range.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only element 0 of the array is used anywhere at this time, so there
should be no changes.
v4: Split out from a single megapatch. Suggested by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v4: Split out from a single megapatch. Suggested by Ken. Also make
meta's save_state::ViewportX, ::ViewportY, ::ViewportW, and ::ViewportH
to match gl_viewport_attrib.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used when the viewport near and far plane are stored as
doubles instead of as floats.
v4 (idr): Split out from a single megapatch. Suggested by Ken. Also
drop value_double_4. It's never used anywhere in the patch series.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Update Mesa and drivers to access updated gl_scissor_attrib.
Now have an enable bitfield and array of gl_scissor_rects.
Drivers have been updated to the new scissor enable state
attribute (gl_context.scissor.EnableFlags) but still treat it
as a single boolean which is okay as mesa will only use
bit 0 when communicating with a driver that does not support
ARB_viewport_array.
v2 (idr): Rebase fixes.
v3 (idr): Small code formatting fix suggsted by Ken.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These limits will be queryable by GL_MAX_VIEWPORTS,
GL_VIEWPORT_SUBPIXEL_BITS, and GL_VIEWPORT_BOUNDS_RANGE. Drivers that
actually implement the extension must set values for these constants
that comply with the minimum-maximums from the spec.
Most of these changes were part of other patches. They were separated out
because it make reordering of later patches easier. Also, MaxViewports wasn't
set by that patch, and I completely overlooked it in review. It's now obvious
that it's set. :)
v2 (idr): Split these changes out from the original patches. Keep
MaxViewportWidth and MaxViewportHeight as GLuint.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Split these changes out from the original patch. Only
advertise GL_ARB_viewport_array in a core profile because it requires
geometry shaders.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were calling draw_total_vs_outputs() too early. The call to
draw_pt_emit_prepare() could result in the vertex size changing.
So call draw_total_vs_outputs() after draw_pt_emit_prepare().
This fix would seem to be needed for the non-LLVM code as well,
but it's not obvious. Instead, I added an assertion there to
try to catch this problem if it were to occur there.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72926
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of skipping x/y clipping completely if there's point_tri_clip points
use guard band clipping. This should be easier (previously we could not disable
generating the x/y bits in the clip mask for llvm path, hence requiring custom
clip path), and it also allows us to enable this for tris-as-points more easily
too (this would require custom tri clip filtering too otherwise). Moreover,
some unexpected things could have happen if there's a NaN or just a huge number
in some tri-turned-point, as the driver's rasterizer would need to deal with it
and that might well lead to undefined behavior in typical rasterizers (which
need to convert these numbers to fixed point). Using a guardband should hence
be more robust, while "usually" guaranteeing the same results. (Only "usually"
because unlike hw guardbands draw guardband is always just twice the vp size,
hence small vp but large points could still lead to different results.)
Unfortunately because the clipmask generated is completely unaffected by guard
band clipping, we still need a custom clip stage for points (but not for tris,
as the actual clipping there takes guard band into account).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
These are components which were originally developed by Tungsten Graphics,
which was in turn acquired by VMware, but are de facto now being maintained
by third-party contributors of the Mesa open-source community.
This matches what's reported by swrast driver and a few other components.
Suggested by Ian Romanick.
Currently the MSVC build is broken because of conflicting definitions of
'log' function. I didn't investigate thoroughly, but I suspect the
it is conflicting standard math.h's log.
log_ is admittedly not a great name, but it is better than a broken build.
A better one can be used in a follow-on build.
By highlighting these special cases makes it clearer to switch
to the fs-generator as the wider scoped compression control
settings used in the current implementation can be simply
dropped.
No regressions on IVB (piglit quick + unit tests).
v2 (Ian): typo in a comment
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Effectively only the mask control bit gets altered for the single
addition in question and hence there is no real need to use a
fresh state control level for it -- that is more useful when
multiple intructions share the same mask and compression settings.
This is a preparation step for removing the explicit compression
control modifiers in the blit compiler. After this patch there
are no nested state control levels making the constant nature of
the compression settings more apparent.
No regressions on IVB (piglit quick + unit tests).
v2 (Matt, Ian): use temporary variable instead of assigning
directly on the same line with a function call.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We call intel_prepare_render() in intelMakeCurrent() to make sure we have
renderbuffers before calling _mesa_make_current(). The only reason we
do this is so that we can have valid defaults for width and height.
If we already have buffers for the drawable we're making current, we
don't need this step.
In itself, this is a small optimization, but it also avoids a round trip
that could block on the display server in a unexpected place.
https://bugs.freedesktop.org/show_bug.cgi?id=72540https://bugs.freedesktop.org/show_bug.cgi?id=72612
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
NV3x cards don't support NPOT textures. Technically this restriction
could be worked around, but since it also doesn't expose any video
decoding hw, just turn it off entirely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
pipe_loader_drm.c: In function 'pipe_loader_drm_probe_fd':
pipe_loader_drm.c:120:4: error: implicit declaration of function 'loader_get_pci_id_for_fd' [-Werror=implicit-function-declaration]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Mesa provides the flexibility of building without the
need to have libdrm present on the system. The situation
has regressed with the recent commit
commit 8c2e7fd846
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Fri Jan 10 23:36:16 2014 +0000
loader: introduce the loader util lib
By isolating libdrm code by #ifndef __NOT_HAVE_DRM_H we
can have libdrm-less builds on across all build systems.
This patch converts Android's _EGL_NO_DRM to __NOT_HAVE_DRM_H
to provide consistency with the other cases within mesa, allows
compilation of libloader on libdrm-less scons and conditionally
links against libdrm if present under automake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73776
BUgzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73777
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only difference is that STATE_SIP takes a 48-bit address, so we need
to output two zeroes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This replaces the old fs_generator backend.
v2: Port to the C-based representation of assembly instructions.
Fix texturing after the texture-grf merge.
v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET.
v4: Pass brw_context to gen8_instruction functions as required.
v5: Fixes for MRT, as well as zero render targets (alpha test only).
v6: Replace n-wide with SIMDn in comments and messages; port over
Topi's blorp-generator changes; add missing TXF_MCS opcode,
fix missing high quality derivatives for DDX; fix typo (all caught
by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+"
comment (caught by Matt). Emit SIMD16 versions of three source
instructions (caught by both Eric and Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the old vec4_generator backend.
v2: Port to use the C-based instruction representation. Also, remove
Geometry Shader offset hacks - the visitor will handle those instead
of this code.
v3: Texturing fixes (including adding textureGather support).
v4: Pass brw_context to gen8_instruction functions as required.
v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes
(caught by Eric). Simplify ADDC/SUBB handling; add comments to
gen8_set_dp_message calls (suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the brw_eu_emit.c layer for Broadwell. It will be
used by both the vector and scalar shader backends.
v2: Port to use the C-based instruction representation.
v3: Fix destination register type for CMP.
v4: Pass brw to gen8_instruction functions (required by rebase).
v5: Remove bogus assertion on math instructions (caught by Piglit).
v6: Remove more restrictions on math instructions (caught by Eric).
Make ADDC and SUBB helpers set accumulator writes, like MAC and
MACH (caught by Matt).
v7: Don't implicitly force ALU3 operations to SIMD8 (we've been able
to do SIMD16 versions since Haswell, but didn't when I originally
wrote this code).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Heavily based on Keith Packard's existing brw_disasm.c code. I've tried
to go through most of the pieces (like SFIDs) and update the lists to
include features added in recent generations.
v2: Port to use the C-based instruction emitters. This allows us to use
C99 array initializers, which tidies up some of the code.
v3: Improve decoding of render target write messages.
v4: Update for BRW_REGISTER_TYPE becoming an abstraction.
v5: Rebase on Chris Forbes' SFID message defines.
v6: Fix disassembly of UV immediates; remove silly casts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell significantly changes the EU instruction encoding. Many of
the fields got moved to different bit positions; some even got split
in two.
With so many changes, it was infeasible to continue using struct
brw_instruction. We needed a new representation.
This new approach is a bit different: rather than a struct, I created a
class that has four DWords, and helper functions that read/write various
bits. This has several advantages:
1. We can create several different names for the same bits. For
example, conditional modifiers, SFID for SEND instructions, and the
MATH instruction's function opcode are all stored in bits 27:24.
In each situation, we can use the appropriate setter function:
set_sfid(), set_math_function(), or set_cond_modifier(). This
is much easier to follow.
2. Since the fields are expressed using the original 128-bit numbers,
the code to create the getter/setter functions follows the table in
the documentation very closely.
To aid in debugging, I've enabled -fkeep-inline-functions when building
gen8_instruction.c. Otherwise, these functions cannot be called by
gdb, making it insanely difficult to print out anything.
Kenneth Graunke wrote most of this code. Damien Lespiau ported it to
C99. Xiang Haihao added media fields. Zhao Yakui added indirect
addressing support. Eric Anholt added an assertion to make sure that
values fit in the alloted number of bits.
v2: Update for brw_reg_type_to_hw_type(), which necessitates passing
brw_context pointers around everywhere.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Matt Turner <mattst88@gmail.com>
While we probably won't ever use these, having them makes it easy to
share disassembler code between intel-gpu-tools and Mesa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is not observed to actually fix anything, but the PRM says this
field must be zero for other surface types.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously this was missing many interesting fields. Having them decoded
makes debugging views much easier.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The index passed to the function is already unsigned, and internally
we threat it as unsigned.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of NV50_MAX_PIPE_CONSTBUFS
per shader stage. Currently the nv50 driver handles only 3 shader
stages, thus we wreck chaos when accessing array-out-of-bounds.
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The textures array is defined as a number of PIPE_MAX_SAMPLERS per shader stage.
Currently nv50 driver handles only 3 shader stages, thus we wreck chaos when
accessing array-out-of-bounds.
Fixes a segfault in piglit/bin/arb_texture_buffer_object-data-sync -fbo -auto
Cc: 9.1 9.2 10.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Use the kernel driver name are returned by drmGetVersion() for
non-pci(platform) devices.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2 (Emil): Rebased and weaked commit message.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Culled out of the "loader: refactor duplicated code into loader util lib"
patch by Rob Clark.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
As per original approach by Rob, each user of the loader lib should include
loader.h and the pci_id_driver_map.h header will be used exclusively by the
loader.
Add back the include guard __IS_LOADER and remove no longer needed include
folder in the scons build.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Additionally this commit removes the following exported functions
_gbm_udev_device_new_from_fd()
_gbm_fd_get_device_name()
_gbm_log()
All three were erroneously marked as exported since their inception.
Neither of them has ever been a part of the API thus there should be
no users of them.
Cc: Chad Versace <chad.versace@linux.intel.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All the various window system integration layers duplicate roughly the
same code for figuring out device and driver name, pci-id's, etc. Which
is sad. So extract it out into a loader util lib.
v2 (Emil)
* Separate the introduction of libloader from the code de-duplication.
* Strip out non-pci devices support.
* Add scons + Android build system support.
* Add VISIBILITY_CFLAGS to avoid exporting the loader funcs.
v3 (Emil)
* PIPE_OS_ANDROID is undefined at this scope, use ANDROID
* Make sure we define _EGL_NO_DRM when building only swrast
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Using an unoptimized variant of glamor spending 50% of its CPU time in
brw_draw_prims() (and hitting the cache *very* frequently):
N Min Max Median Avg Stddev
x 200 29200 40500 34900 34750 958.43256
+ 200 31000 40300 34700 34622 916.35941
No difference proven at 95.0% confidence
Similarly, no difference on GLB2.7:
N Min Max Median Avg Stddev
x 63 64.1 71.36 70.69 70.113175 1.6782026
+ 63 63.6 71.18 70.75 70.223651 1.6044186
No difference proven at 95.0% confidence
v2: Rebase on master (by anholt)
v3: Add a missing BEGIN_BATCH(3) to aa_line_parameters -- CACHED_BATCH
didn't have the asserts about batchbuffer usage that ADVANCE_BATCH
does, so we started assertion failing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Those are the terms used in the docs, and think "n-wide" was something I
just happened to say. Note that shader-db needs updating for the
INTEL_DEBUG=fs parsing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The original intent was that we'd keep a driver-private copy, and there
would be the normal copy for swrast to make use of without the tuning (or
anything more invasive we might do) specific to i965. Only, we don't
generate swrast code any more, because swrast can't render current shaders
anyway. Thus, our private copy is rather a waste, and we can just do our
backend-specific operations on the linked shader.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the
old copyright name is creating unnecessary confusion, hence this change.
This was the sed script I used:
$ cat tg2vmw.sed
# Run as:
#
# git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed
#
# Rename copyrights
s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g
/Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./
s/TUNGSTEN GRAPHICS/VMWARE/g
# Rename emails
s/alanh@tungstengraphics.com/alanh@vmware.com/
s/jens@tungstengraphics.com/jowen@vmware.com/g
s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/
s/jrfonseca\?@tungstengraphics.com/jfonseca@vmware.com/g
s/keithw\?@tungstengraphics.com/keithw@vmware.com/g
s/michel@tungstengraphics.com/daenzer@vmware.com/g
s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/
s/zack@tungstengraphics.com/zackr@vmware.com/
# Remove dead links
s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g
# C string src/gallium/state_trackers/vega/api_misc.c
s/"Tungsten Graphics, Inc"/"VMware, Inc"/
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression since 9baa45f78b
but some of the piglit fbo-drawbuffers-none tests still don't
pass.
v2: use the right pointer type for 'h'
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes regression from 9baa45f78b
v2: incorporate a few small changes suggested by Roland.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The whole round-pointsize-to-int stuff must only be done with GL legacy
rules (no point_quad_rasterization) or all the wrong edges are lit up.
This was previously in a private branch (d3d pointsprite test complains
loudly otherwise) and got lost in a merge. However, it should certainly
apply to GL point sprite rasterization as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
OpenGL does whole-point clipping, that is a large point is either fully
clipped or fully unclipped (the latter means it may extend beyond the
viewport as long as the center is inside the viewport). d3d9 (d3d10 has
no large points) however requires points to be clipped after they are
expanded to a rectangle. (Note some IHVs are known to ignore GL rules at
least with some hw/drivers.)
Hence add a rasterizer bit indicating which way points should be clipped
(some drivers probably will always ignore this), and add the draw interaction
this requires. Drivers wanting to support this and using draw must support
large points on their own as draw doesn't implement vp clipping on the
expanded points (it potentially could but the complexity doesn't seem
warranted), and the driver needs to do viewport scissoring on such points.
Conflicts:
src/gallium/drivers/llvmpipe/lp_context.c
src/gallium/drivers/llvmpipe/lp_state_derived.c
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit c13970808 (mesa: GL_EXT_secondary_color is not optional) changed
CHECK_EXTENSION2(EXT_secondary_color, ARB_vetex_program, cap)
to
CHECK_EXTENSION(ARB_vertex_program, cap)
However CHECK_EXTENSION2 checks that either extension is available, not
both. Remove the extension check entirely since the intent was for it to
always be enabled.
v2: Fix glGet*(GL_COLOR_SUM) too. Suggested by Ian.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 9.2 10.0 <mesa-stable@lists.freedesktop.org>
It's possible to bind a smaller buffer as a constant buffer, than
what the shader actually uses/requires. This could cause nasty
crashes. This patch adds the architecture to pass the maximum
allowable constant buffer index to the jit to let it make
sure that the constant buffer indices are always within bounds.
The behavior follows the d3d10 spec, which says the overflow
should always return all zeros, and overflow is only defined
as access beyond the size of the currently bound buffer. Accesses
beyond the declared shader constant register size are not
considered an overflow and expected to return garbage but consistent
garbage (we follow the behavior which some wlk tests expect which
is to return the actual values from the bound buffer).
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 95bf222603 (cso_context: Fix cso_context::sample_mask initial
value.) fixed the cso sample mask to be initialized to ~0. The cso code
is also careful not to needlessly call set_sample_mask, so we ended up
with the ctx->sample_mask never being set. This broke a number of
EXT_framebuffer_multisample piglit tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
ctx->DrawIndirectBuffer wasn't being free'd in _mesa_free_buffer_objects
With this patch, "valgrind --leak-check=full glxgears" on evergreen (CEDAR)
now shows:
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 70,228 bytes in 651 blocks
suppressed: 0 bytes in 0 blocks
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The radeonsi code was not cleaning up either of these items leading to
leaked memory.
v2: Move cleanup to r600_common_context_cleanup instead of duplicating
the logic for SI
CC: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The i830 and i915 drivers used them, but they didn't really need to.
They will just be annoying in future patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future patch will rename some of the fields of gl_viewport_attrib, and
I don't want to update dead code that I can't test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Dave Airlie <airlied@redhat.com>
The ES and desktop GL specs diverge here. Yay!
In desktop OpenGL, the driver can perform online compression of
uncompressed texture data. GL_NUM_COMPRESSED_TEXTURE_FORMATS and
GL_COMPRESSED_TEXTURE_FORMATS give the application a list of formats
that it could ask the driver to compress with some expectation of
quality. The GL_ARB_texture_compression spec calls this "suitable for
general-purpose usage." As noted above, this means
GL_COMPRESSED_RGBA_S3TC_DXT1_EXT is not included in the list.
In OpenGL ES, the driver never performs compression.
GL_NUM_COMPRESSED_TEXTURE_FORMATS and GL_COMPRESSED_TEXTURE_FORMATS give
the application a list of formats that the driver can receive from the
application. It is the *complete* list of formats. The
GL_EXT_texture_compression_s3tc spec says:
"New State for OpenGL ES 2.0.25 and 3.0.2 Specifications
The queries for NUM_COMPRESSED_TEXTURE_FORMATS and
COMPRESSED_TEXTURE_FORMATS include COMPRESSED_RGB_S3TC_DXT1_EXT,
COMPRESSED_RGBA_S3TC_DXT1_EXT, COMPRESSED_RGBA_S3TC_DXT3_EXT,
and COMPRESSED_RGBA_S3TC_DXT5_EXT."
Note that the addition is only to the OpenGL ES specification!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
See-also: http://lists.freedesktop.org/archives/mesa-dev/2013-October/047439.html
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Returning a reference is incorrect if the specified pair was a
temporary -- Instead of that, use decltype() to deduce the correct
return type qualifiers. Fixes a crash in clCreateProgramWithBinary().
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
According to the spec it's allowed to call clBuildProgram() on a
program created from a user-specified binary. We don't need to do
anything to build the program in that case.
Reported-and-tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
This avoids the inefficient multiple evaluation of the map result in
the code below. It should cause no functional changes.
Tested-by: "Dorrington, Albert" <albert.dorrington@lmco.com>
From ARB_shader_image_load_store:
If a texture object bound to one or more image units is deleted by
DeleteTextures, it is detached from each such image unit, as though
BindImageTexture were called with <unit> identifying the image unit
and <texture> set to zero.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And uncomment the relevant lines of the dispatch sanity test.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Name image format classes consistently, fix array and 3D teximage
selection with layered = GL_FALSE, make sure that the
user-specified layer is less than the number of texture layers,
add some asserts.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. ARB_shader_image_load_store
requires support for the GL_RG8_SNORM and GL_RG16_SNORM formats, which
map to MESA_FORMAT_SIGNED_GR88 and MESA_FORMAT_SIGNED_GR1616 on
little-endian hosts, and MESA_FORMAT_SIGNED_RG88 and
MESA_FORMAT_SIGNED_RG1616 respectively on big-endian hosts -- only the
former were already present, add support for the latter.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Including pack/unpack and texstore code. This texture format is a
requirement for ARB_shader_image_load_store.
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
v2: Increase MAX_IMAGE_UNITS to what i965 wants and add a separate
MAX_IMAGE_UNIFORMS define, clarify a couple of comments.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
And to check if it can have layers at all. This will be used by the
implementation of ARB_shader_image_load_store.
v2: Fix constness of texobj argument, use assert and return reasonable
default rather than calling unreachable() in default switch case.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The temporary variable used to store _ColorDrawBufferIndexes must be
signed (GLint), otherwise the following conditional will be incorrectly
evaluated. Leading to crashes in the driver/mesa or accessing/writing
to arbitrary memory location. The bug dates back to 2009.
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
_ColorDrawBufferIndexes is defined as GLint* and using a GLuint*
will result in the first part of the conditional to be evaluated to
true always.
Unintentionally introduced by the following commit, this will result
in a driver segfault if one is using an old version of the piglit test
bin/clearbuffer-mixed-format -auto -fbo
commit 03d848ea10
Author: Marek Olšák <marek.olsak@amd.com>
Date: Wed Dec 4 00:27:20 2013 +0100
mesa: fix interpretation of glClearBuffer(drawbuffer)
This corresponding piglit tests supported this incorrect behavior instead of
pointing at it.
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: 10.0 9.2 9.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If it wasn't necessary for Haswell, it's likely not to be necessary for
Broadwell either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to disable HiZ for non-8x4 aligned levels, except for level 0, layer
0. For the very first layer we can adjust Width and Height fields of
3DSTATE_DEPTH_BUFFER to make it aligned.
Specifically, add ILO_TEXTURE_HIZ and set the flag only for properly aligned
levels. ilo_texture_can_enable_hiz() is updated to check for the flag.
In tex_layout_validate(), align the depth bo to 8x4 so that we can adjust
Width/Height of 3DSTATE_DEPTH_BUFFER without introducing out-of-bound access.
Finally in rectlist blitter, add the ability to adjust 3DSTATE_DEPTH_BUFFER.
Add tex_layout_init_hiz() before tex_layout_init_format() to decide whether
HiZ should be enabled.
On GEN6, because of layer offsetting, HiZ is enabled only when the texture is
non-mipmapped and non-array. PIPE_USAGE_STAGING is also taken as a hint to
disable HiZ.
These can't use foreach_list since they want to skip over the first few
list elements. Just doing the ad-hoc list walking isn't too bad.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When handling function calls, we often want to walk through the list of
formal parameters and list of actual parameters at the same time.
(Both are guaranteed to be the same length.)
Previously, we used a pattern of:
exec_list_iterator 1st_iter = <1st list>.iterator();
foreach_iter(exec_list_iterator, 2nd_iter, <2nd list>) {
...
1st_iter.next();
}
This was awkward, since you had to manually iterate through one of
the two lists.
This patch introduces a foreach_two_lists macro which safely walks
through two lists at the same time, so you can simply do:
foreach_two_lists(1st_node, <1st list>, 2nd_node, <2nd list>) {
...
}
v2: Rename macro from foreach_list2 to foreach_two_lists, as suggested
by Ian Romanick.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formal function parameters are always ir_variable objects, not an
arbitrary ir_instruction. So there's no need to dynamically cast here.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A function call's parameters are always rvalues. ir_rvalue may not
always be a subclass of ir_instruction in the future, so we should use
the right one.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
foreach_iter and exec_list_iterators have been deprecated for some time now;
we just hadn't ever bothered to convert code to the newer foreach_list
and foreach_list_safe macros.
In these cases, we aren't editing the list, so we can use foreach_list
rather than foreach_list_safe.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to this patch, if we ran out of aperture space during
brw_try_draw_prims(), we would rewind the batch buffer pointer
(potentially throwing some state that may have been emitted by
brw_upload_state()), flush the batch, and then try again. However, we
wouldn't reset the dirty bits to the state they had before the call to
brw_upload_state(). As a result, when we tried again, there was a
danger that we wouldn't re-emit all the necessary state. (Note: prior
to the introduction of hardware contexts, this wasn't a problem
because flushing the batch forced all state to be re-emitted).
This patch fixes the problem by leaving the dirty bits set at the end
of brw_upload_state(); we only clear them after we have determined
that we don't need to rewind the batch buffer.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
An example why it is required:
Let's say there's a fragment shader writing to gl_FragData[0..1].
The user calls: glDrawBuffers(2, {GL_NONE, GL_COLOR_ATTACHMENT0});
That means gl_FragData[0] is unused and gl_FragData[1] is written
to GL_COLOR_ATTACHMENT0.
st/mesa was skipping the GL_NONE draw buffer, therefore gl_FragData[0]
was written to GL_COLOR_ATTACHMENT0, which was wrong.
This commit fixes it, but drivers must also be fixed not to crash when
binding NULL colorbuffers. There is also a new set of piglit tests for this.
The MSAA state also had to be fixed not to crash when reading fb->cbufs[0].
Reviewed-by: Brian Paul <brianp@vmware.com>
yInverted is used by EGL_NOK_texture_from_pixmap to indicate that
window system rendering is y-inverted compared to OpenGL texture
representation. This extension is only known to be used with X11
window system where sane default is GL_TRUE.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73371
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
dri2_add_config makes decisions based on NOK_texture_from_pixmap so
it needs to be enabled before calling dri2_add_configs_for_visuals.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Piglit was recently changed to expect the correct error code (piglit
commit 271b998), so it started failing on Mesa. This corrects that
failing and adds some spec quotations to justify the errrors set.
The code was rearranged a little bit to match the order listed in the
spec.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
brw_queryobj.c needs a version of write_timestamp that works on all
generations for the QueryCounter() driver hook. So there's no point in
duplicating it in gen6_queryobj.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, Mesa enforced the following rule (from
ARB_geometry_shader4's list of criteria for framebuffer completeness):
* If any framebuffer attachment is layered, all attachments must have
the same layer count. For three-dimensional textures, the layer count
is the depth of the attached volume. For cube map textures, the layer
count is always six. For one- and two-dimensional array textures, the
layer count is simply the number of layers in the array texture.
{ FRAMEBUFFER_INCOMPLETE_LAYER_COUNT_ARB }
However, when ARB_geometry_shader4 was adopted into GL 3.2, this rule
was dropped; GL 3.2 permits different attachments to have different
layer counts. This patch brings Mesa in line with GL 3.2.
In order to ensure that layered clears properly clear all layers, we
now have to keep track of the maximum number of layers in a layered
framebuffer.
Fixes the following piglit tests in spec/!OpenGL 3.2/layered-rendering:
- clear-color-all-types 1d_array mipmapped
- clear-color-all-types 1d_array single_level
- clear-color-mismatched-layer-count
- framebuffer-layer-count-mismatch
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
From section 4.4.4 (Framebuffer Completeness) of the GL 3.2 spec:
If any framebuffer attachment is layered, all populated
attachments must be layered. Additionally, all populated color
attachments must be from textures of the same target.
We weren't checking that the attachments were from textures of the
same target.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 1a92881 added extra flushes to fix a HiZ hang in
WebGL Google Maps. With the extra flushes emitted by the previous two
patches, the flushes added by 1a92881 are redundant.
Tested with the same criteria as in 1a92881: by zooming in and out
continuously for 2 hours on Sandybridge Chrome OS (codename
Stumpy) without a hang.
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Unconditionally set brw->need_workaround_flush at the top of gen6 blorp
state emission.
The art of emitting workaround flushes on Sandybridge is mysterious and
not fully understood. Ken and I believe that
intel_emit_post_sync_nonzero_flush() may be required when switching from
regular drawing to blorp. This is an extra safety measure to prevent
undiscovered difficult-to-diagnose gpu hangs.
I verified that on ChromeOS, pre-patch, need_workaround_flush was not
set at the top of blorp, as Paul expected. To verify, I inserted the
following debug code at the top of gen6_blorp_exec(), restarted the ui,
and inspected the logs in /var/log/ui. The abort gets triggered so early
that the browser never appears on the display.
static void
gen6_blorp_exec(...)
{
if (!brw->need_workaround_flush) {
fprintf(stderr, "chadv: %s:%d\n", __FILE__, __LINE__);
abort();
}
...
}
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This patch makes the workaround code in gen6 blorp follow the pattern
established in the regular draw path. It shouldn't result in any
behavioral change.
On gen6, there are two places where we emit 3D_CMD_PRIM: brw_emit_prim()
and gen6_blorp_emit_primitive(). brw_emit_prim() sets
need_workaround_flush immediately after emitting the primitive, but
blorp does not. Blorp sets need_workaround_flush at the bottom of
brw_blorp_exec().
This patch moves the need_workaround_flush from brw_blorp_exec() to
gen6_blorp_emit_primitive(). There is no need to set
need_workaround_flush in gen7_blorp_emit_primitive() because the
workaround applies only to gen6.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
We weren't handling the LUMINANCE_SNORM, LUMINANCE_ALPHA_SNORM and
INTENSITY_SNORM cases. Note that adding these cases here does not
require a driver to support rendering to these surface types. If
the driver can't do it we'll report an incomplete framebuffer.
NVIDIA doesn't support GL_EXT_texture_snorm but their driver
accepts these formats in glRenderBufferStorage().
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This allows the caller to execute it in a loop rather than
hand-rolling a separate call for each stage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are replaced with
ctx->Const.Program[MESA_SHADER_{VERTEX,FRAGMENT,GEOMETRY}]. In
patches to follow, this will allow us to replace a lot of ad-hoc logic
with a variable index into the array.
With the exception of the changes to mtypes.h, this patch was
generated entirely by the command:
find src -type f '(' -iname '*.c' -o -iname '*.cpp' -o -iname '*.py' \
-o -iname '*.y' ')' -print0 | xargs -0 sed -i \
-e 's/Const\.VertexProgram/Const.Program[MESA_SHADER_VERTEX]/g' \
-e 's/Const\.GeometryProgram/Const.Program[MESA_SHADER_GEOMETRY]/g' \
-e 's/Const\.FragmentProgram/Const.Program[MESA_SHADER_FRAGMENT]/g'
Suggested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit eda21d2a30 fixed the rasterization
of points for Direct3D but ended up breaking the rasterization of OpenGL
non-sprite points, in particular conform's pntrast.c test.
The only way to get both working is to properly honour
pipe_rasterizer::point_quad_rasterization, and follow the weird OpenGL
rule when it is false.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We definitely want to fall through to the unsynchronized map case, instead
of wasting bandwidth on a copy. Prevents a -43.2407% +/- 1.06113% (n=49)
performance regression on aa10perf when teaching glamor to provide the
GL_INVALIDATE_RANGE_BIT information.
This is a performance fix, which I usually wouldn't cherry-pick to stable.
But this was really was just a bug in the code, its presence would
discourage developers from giving us the best information they can, and I
think we've got fairly high confidence in the unsynchronized map path
already.
Cc: 10.0 9.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While incorrect, it probably wouldn't affect anyone ever: You'd have to do
an appropriately-formatted readpixels into a PBO, then overwrite the tail
end of the updated area of the PBO with glBufferSubData(), and you
wouldn't get appropriate synchronization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The earlier assert made sure that our math didn't exceed our bounds, but
this makes sure that we don't overflow from the high bits X into the low
bits of Y. We've already put checks in intel_miptree_blit(), but I've
wanted to expand the type in our protoype from short to uint32_t, and we
could get in trouble with intel_emit_linear_blit() if we did.
v2: Add Ken's comment about the funny language extension used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
This was one of the things we always wanted to do to this, to make it more
useful than just (value << FIELD_MASK).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
With all of the flipping and pitch twiddling and miptree layout involved
in our blits, there are lots of ways for us to scribble outside of a
buffer. Put in a check that we're not about to do so.
This catches a bug that glamor was running into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This fixes the following compile error:
src\glsl\ir_constant_expression.cpp(1405) : error C2666: 'copysign' : 3
overloads have similar conversions
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add for now some simple/basic query support (ie. things not actually
requiring the GPU). Might change around a bit when I actually add
GPU queries, but for now this enables some useful performance info
in the GALLIUM_HUD. For example:
GALLIUM_HUD=fps+batches+batches-sysmem+batches-gmem+restores,draw-calls
The driver specific specific queries are:
+ draw-calls
+ batches - number of batches per second, sum of batches-sysmem
plus batches-gmem
+ batches-gmem - render a set of tiles in GMEM, for each tile
(optionally) system mem -> gmem (restore), plus N draws,
plus gmem -> system mem (resolve) per second
+ batches-sysmem - N draws to system memory (GMEM bypass) per
second
+ restores - number of GMEM batches that required restore per
second
Ideally for GMEM rendering, you want batches-gmem to equal fps. If
the app is doing something that triggers multiple passes (ie. requires
extra round trip gmem <-> system memory) then the # of batches per
second will go up relative to fps.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since we now have the cmdstream patch mechanism needed for hw binning,
might as well also use it for RB_RENDER_CONTROL updates. This avoids
the need to use RMW (and associated WFI) to update RB_RENDER_CONTROL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The binning pass sorts vertices into which bins/tiles they apply to.
The visibility information generated during the binning pass can be
used to speed up the rendering pass by filtering out vertices which
do not apply to the current tile. See:
https://github.com/freedreno/freedreno/wiki/Adreno-tiling#optimized-approach
This brings a significant fps boost. A rough assortment of tests
(supertuxkart, etracer, tremulous, glmark2 'build' test, etc) seems
to yield a ~35-45% fps improvement.
For now, to be conservative, the binning pass is not enabled yet by
default. To enable it use:
FD_MESA_DEBUG=binning
So far I haven't found anything that breaks with binning enabled,
but I'd like a bit more testing before I enable it as default.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The hardware is broken with nonzero texel offsets and unnormalized
coordinates; instead of doing correct offsetting, we get garbage.
This just extends the existing workaround for ir_txf and
ir_tg4+gsampler2DRect to also consider ir_tex+gsampler2DRect.
Fixes broken rendering in 'tesseract' when 'mesa_texrectoffset_bug' is
not enabled; also fixes the new piglit test
'tests/spec/glsl-1.30/execution/fs-textureOffset-Rect'.
Has been broken ~forever; suggesting including this in only 10.0 because
the lowering pass doesn't exist in 9.2 or earlier so would require quite
a different patch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Lee Salzman <lsalzman@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename "shaderType" param of is_varying_var() to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
This reduces confusion since gl_shader::Type is sometimes
GL_SHADER_PROGRAM_MESA but is more frequently
GL_SHADER_{VERTEX,GEOMETRY,FRAGMENT}. It also has the advantage that
when switching on gl_shader::Stage, the compiler will alert if one of
the possible enum types is unhandled. Finally, many functions in
src/glsl (especially those dealing with linking) already use
gl_shader_stage to represent pipeline stages; using gl_shader::Stage
in those functions avoids the need for a conversion.
Note: in the process I changed _mesa_write_shader_to_file() so that if
it encounters an unexpected shader stage, it will use a file suffix of
"????" rather than "geom".
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also move the related #define MESA_SHADER_STAGES. This will allow
gl_shader_stage to be used in struct gl_shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Split from patch "mesa: Store gl_shader_stage enum in gl_shader objects."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we had an enum called gl_shader_type which represented
pipeline stages in the order they occur in the pipeline
(i.e. MESA_SHADER_VERTEX=0, MESA_SHADER_GEOMETRY=1, etc), and several
inconsistently named functions for converting between it and other
representations:
- _mesa_shader_type_to_string: gl_shader_type -> string
- _mesa_shader_type_to_index: GLenum (GL_*_SHADER) -> gl_shader_type
- _mesa_program_target_to_index: GLenum (GL_*_PROGRAM) -> gl_shader_type
- _mesa_shader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
This patch tries to clean things up so that we use more consistent
terminology: the enum is now called gl_shader_stage (to emphasize that
it is in the order of pipeline stages), and the conversion functions are:
- _mesa_shader_stage_to_string: gl_shader_stage -> string
- _mesa_shader_enum_to_shader_stage: GLenum (GL_*_SHADER) -> gl_shader_stage
- _mesa_program_enum_to_shader_stage: GLenum (GL_*_PROGRAM) -> gl_shader_stage
- _mesa_progshader_enum_to_string: GLenum (GL_*_{SHADER,PROGRAM}) -> string
In addition, MESA_SHADER_TYPES has been renamed to MESA_SHADER_STAGES,
for consistency with the new name for the enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Also rename the "target" field of _mesa_glsl_parse_state and the
"target" parameter of _mesa_shader_stage_to_string to "stage".
Reviewed-by: Brian Paul <brianp@vmware.com>
The adjustment needs to be applied to the y coordinates and not the x
coordinates, just like the equivalent code for lines and triangles in
lp_setup_line.c and lp_setup_tri.c.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
This was inadvertently forgotten when replacing gl_rasterization_rules
with lower_left_origin and half_pixel_center (commit
2737abb44e).
This makes a difference when lower_left_origin != half_pixel_center, e.g,
D3D10.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
When the depth buffer is to be read, perform a Depth Buffer Resolve if it has
been rendered. When the depth buffer is to be rendered, perform a HiZ Buffer
Resolve when the depth buffer is modified externally.
Add blitter functions to perform Depth Buffer Clear, Depth Buffer Resolve, and
Hierarchical Depth Buffer Resolve. Those functions set ilo_blitter up and
pass it to the pipelines to emit the commands.
Allow HiZ op to be specified in 3DSTATE_WM. Pass depth format directly in
gen7_emit_3DSTATE_SF. Use tex->hiz.bo to determine if HiZ exists. Fix
3DSTATE_SF for the case when there is no ilo_rasterizer_state. Fix
3DSTATE_PS for the case when there is no ilo_shader_state.
GEN6 has several requirements regarding the LOD/Depth/Width/Height of the
render targets and the depth buffer. We used to offset to the layers in
question unconditionally to meet the requirements. With this commit,
offseting is done only when the requirements are not met.
On Haswell, POW takes 24 cycles, while EXP2 only takes 14. Plus, using
POW requires putting 2.0 in a register, while EXP2 doesn't.
I believe that EXP2 will be faster than POW on basically all GPUs, so
it makes sense to optimize it.
Looking at the savage2 subset of shader-db:
total instructions in shared programs: 113225 -> 113179 (-0.04%)
instructions in affected programs: 2139 -> 2093 (-2.15%)
instances of 'math pow': 795 -> 749 (-6.14%)
instances of 'math exp': 389 -> 435 (11.8%)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch creates a new generic is_value() method, which checks if an
ir_constant has a particular value. (For vectors, it must have the
single value repeated across all components.)
It then rewrites the is_zero/is_one/is_negative_one methods to use this
generic helper. All three were basically identical except for the value
they checked for. The other difference is that is_negative_one rejects
boolean types. The new is_value function maintains this behavior, only
allowing boolean types when checking for 0 or 1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using the get_local_param_pointer helper ensures that the LocalParams
arrays have actually been allocated before attempting to use them.
glProgramLocalParameters4fvEXT needs to do a bit of extra checking,
but it can be simplified since the helper has already validated the
target.
Fixes crashes in programs that use Cg (for example, Awesomenauts,
Rocketbirds: Hardboiled Chicken, and Tiny and Big: Grandpa's Leftovers)
since commit e5885c119d
(mesa: Dynamically allocate the storage for program local parameters.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73136
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Laurent Carlier <lordheavym@gmail.com>
We don't support MSAA (ie, number of samples is always one) therefore
sample_mask boils down to a synonym of the rasterizer_discard flag.
Also, this change makes setup actually use the value received in
lp_setup_set_rasterizer_discard instead of reaching out to llvmpipe
upper layers to re-fetch it.
Based on Si Chen's draft.
With this patch `wgf11multisample Coverage passes 100%` on the UMD
D3D10 state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Si Chen <sichen@vmware.com>
The initial value of cso_context::sample_mask_saved is irrelevant as it
will be overwritten with cso_context::sample_mask in
cso_save_sample_mask. Therefore it is cso_context::sample_mask that
needs to be properly initialized.
This fixes regressions in blits and mipmap generation after adding
support for sample_mask to llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Implement Alpha to Coverage by discarding a fragment alpha component is
less than 0.5. This is a joint work of Jose and Si.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Commit 9119269ca1 moved the texel
buffer allocation to _swrast_texture_span(), however, when compiled
with OpenMP support this code already runs multi-threaded so a
critical section is required to prevent multiple allocations and
rendering errors.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Evidently, there's some other definition of "min" and "max" that
causes MSVC to choke on these function names. Renaming to min2()
and max2() fixes things.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both brw_defines.h and intel_reg.h defined PIPE_CONTROL fields, which
had similar names, but couldn't be used in the same way. (One had
built-in shifts, and the other didn't...)
Delete the unused set to preserve sanity.
(Eric wrote an almost identical patch back in August, so I believe he
approves.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes this build error with gcc <= 4.5 and clang <= 3.1.
CC clientattrib.lo
In file included from ../../include/GL/glx.h:333:0,
from glxclient.h:45,
from clientattrib.c:32:
../../include/GL/glxext.h:275:13: error: redefinition of typedef 'GLXContextID'
../../include/GL/glx.h:171:13: note: previous declaration of 'GLXContextID' was here
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70591
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* The Haiku renderers need to link to libGL to function properly
in all usage contexts. As mesa drivers build before gallium
targets, we couldn't properly link the mesa swrast driver to
the gallium libGL target for Haiku.
* This is likely better as it mimics how glx is laid out ensuring
the Haiku libGL is better understood.
* All renderers properly link in libGL now.
Acked-by: Brian Paul <brianp@vmware.com>
These are just software flag values (not hardware specific values), and
aren't used anywhere. Delete them to avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Add extra null check in auto generated indirect_init.c via
src/mapi/glapi/gen/glX_proto_send.py
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously we left the size of this vgrf as 1, which caused register
allocation to be subtly broken. If we were lucky we would explode in
the post-alloc instruction scheduler; if we were unlucky we'd just stomp
on someone else and get broken rendering.
Fixes crash when running `tesseract` with the following settings:
msaa 4
glineardepth 0
Also fixes the piglit test:
arb_sample_shading-builtin-gl-sample-id
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72859
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The preprocessor currently accepts multiple else/elif-groups
per if-section. The GLSL-preprocessor is defined by the C++
specification, which defines the following parse-rule:
if-section:
if-group elif-groups(opt) else-group(opt) endif-line
This clearly only allows a single else-group, that has to come
after any elif-groups.
So let's modify the code to follow the specification. Add test
to prevent regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Carl Worth <cworth@cworth.org>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).
The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:
#define FOO a/*
*/b
FOO
Prior to commit bd55ba568b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.
In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).
Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).
It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.
Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
Finally, this commit also changes the expected output of a few, existing glcpp
tests. The change here is that the space character resulting from the
multi-line comment is now emitted before the newlines corresponding to that
comment. (Previously, the newlines were emitted first, and the space character
afterward.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72686
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Two things make this code confusing:
1. The uncharacteristic manipulation of lexer start state outside of
flex rules.
2. The confusing semantics of the skip_stack (including the
"lexing_if" override and the SKIP_NO_SKIP state).
This new comment is intended to bring a bit more clarity for any readers.
There is no intended beahvioral change to the code here. The actual code
changes include better indentation to avoid an excessively-long line, and
using the more descriptive INITIAL rather than 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Support all levels of a supported texture format.
Using 1024x1024, RGBA 8888 source, mipmap
internal-format Before (MB/sec) mipmap (MB/sec)
GL_RGBA 627.15 615.90
GL_RGB 456.35 611.53
512x512
GL_RGBA 597.00 619.95
GL_RGB 440.62 611.28
256x256
GL_RGBA 487.80 587.42
GL_RGB 376.63 585.00
Benchmark has been sent to mesa-dev list: teximage_enh
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
MESA_FORMAT_XRGB8888 is equivalent to MESA_FORMAT_ARGB8888 in terms
of storage on the device, so okay to use this optimized copy routine.
This series builds on work from Frank Henigman to optimize the
process of uploading a texture to the GPU. This series adds support for
MESA_XRGB_8888 and full miptrees where were found to be common activities
in the Smokin' Guns game. The issue was found while profiling the app
but that part is not benchmarked. Smokin-Guns uses mipmap textures with
an internal format of GL_RGB (MESA_XRGB_8888 in the driver).
These changes need a performance tool to run against to show how they
improve execution performance for specific texture formats. Using this
benchmark I've measured the following improvement on my Ivybridge
Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz.
1024x1024 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 628.15 627.15
GL_RGB 265.95 456.35
512x512 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 600.23 597.00
GL_RGB 255.50 440.62
256x256 texture size
internal-format Before (MB/sec) XRGB (MB/sec)
GL_RGBA 489.08 487.80
GL_RGB 229.03 376.63
Benchmark has been sent to mesa-dev list: teximage
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Only a Mesa bug could cause this function to be called with an
out-of-range index, so raise an assertion if that ever happens.
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch replaces the following pattern:
foo bar[MESA_SHADER_TYPES] = {
...
};
With:
foo bar[] = {
...
};
STATIC_ASSERT(Elements(bar) == MESA_SHADER_TYPES);
This way, when a new shader type is added in a future version of Mesa,
we will get a compile error to remind us that the array needs to be
updated.
Reviewed-by: Brian Paul <brianp@vmware.com>
This argument was carrying the name of the shader target (as a
string). We can get this just as easily by calling
_mesa_shader_enum_to_string().
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously, _mesa_glsl_shader_target_name() had an overload for GLenum
and an overload for the gl_shader_type enum, each of which behaved
differently. However, since GLenum is a synonym for unsigned int, and
unsigned ints are often used in place of gl_shader_type (e.g. in loop
indices), there was a big risk of calling the wrong overload by
mistake. This patch gives the two overloads different names so that
it's always clear which one we mean to call.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit 136a12ac98.
According to belak51 on IRC, this commit broke Allegro, which would no
longer compile. Applications apparently expect the GLXContextID typedef
to exist in glx.h; removing it breaks them. A bit of searching around
the internet revealed other complaints since upgrading to Mesa 10.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
According to git blame, this hasn't been used in over two years:
commit d2235b0f46
Author: Eric Anholt <eric@anholt.net>
Date: Thu Nov 17 17:01:58 2011 -0800
i965: Always handle GL_DEPTH_TEXTURE_MODE through the shader.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
instead of ignoring the argument and always dumping to
standard output.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Using RMW on banked context registers is not safe. The value read
could be the wrong one. So if there has been a DRAW_IDX launched,
the RMW must be preceded by a WAIT_FOR_IDLE to ensure the read part
of RMW sees the correct value.
To avoid unnecessary WFI's, keep track if there is a need for WFI,
and only emit one if needed. Furthermore, keep track if we even
need to update the register in the first place.
And to cut down on the amount of RMW to avoid excessive WFI's, at the
tiling/GMEM level we can always overwrite RB_RENDER_CONTROL, as the
state at beginning of draw/clear cmds (which we IB to) is always
undefined. In the draw/clear commands, we always still use RMW (with
WFI if needed), but only if the register value actually changes. (At
points where the current value cannot be known, the saved value is
reset to ~0, which includes bits outside of RBRC_DRAW_STATE, so there
never is chance for confusion.)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Actually assign VSC_PIPE's properly, which will be needed for tiling.
And introduce fd_tile for per-tile state (including the assignment of
tile to VSC_PIPE). This gives us the proper pipe setup that we'll
need for hw binning pass, and also cleans things up a bit by not having
to pass so many parameters around. And will also make it easier to
introduce different tiling patterns (since we may no longer render
tiles in a simple left-to-right top-to-bottom pattern).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Previously we were creating a new LLVMContext every time that we called
radeon_llvm_parse_bitcode, which caused us to leak the context every time
that we compiled a CL program.
Sadly, we can't dispose of the LLVMContext at the point that it was being
created because evergreen_launch_grid (and possibly the SI equivalent) was
assuming that the context used to compile the kernels was still available.
Now, we'll create a new LLVMContext when creating EG/SI compute state, store
it there, and pass it to all of the places that need it.
The LLVM Context gets destroyed when we delete the EG/SI compute state.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash where old_view->context was already freed in the
pipe_sampler_view_reference function contained in
src/gallium/auxiliary/utils/u_inlines.h. As a result, the
sampler_view_destroy function pointer contained 0xfeeefeee indicating
freed heap memory.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jonathan Liu <net147@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver supports it. All current and future Gallium drivers always
support it, and all existing classic drivers support it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Useful in its own right, but also needed for adaptive vsync.
No regressions in the piglit glx-oml-sync-control-getmscrate test.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
It is the maximum number of back buffers, but the name is confusing and is
easily read as the maximum back buffer index. Chage to DRI3_NUM_BACK to make
the intended usage a bit clearer.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just copying code from the dri2 path to set up the fast color clear state.
This also removes a couple of bogus intel_region_reference calls.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The buffer-object is the persistent thing passed through the loader, so when
updating an image buffer, check to see if it is already bound to the provided
bo. The region, on the other hand, is allocated separately for the miptree,
and so will never be the same as that passed back from the loader.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move the depth field up with width and height.
Remove unused previous_time and frames fields.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libxshmfence v1.0 foolishly used 'int32_t *' for the fence type, which
works when the fence is a linux futex. However, version 1.1
changes the exported datatype to 'struct xshmfence *'
Require libxshmfence version 1.1 and switch the API around.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While looking through the documentation, I found this in the Sandybridge
PRM (Volume 4, Part 1, Page 140):
"Use of sample_c with SURFTYPE_CUBE surfaces is undefined with the
following surface formats: I24X8_UNORM, L24X8_UNORM, A24X8_UNORM,
I32_FLOAT, L32_FLOAT, A32_FLOAT."
I haven't observed this to be true, but it suggests that we may want to
use other formats.
We already perform DEPTH_TEXTURE_MODE swizzling in the shaders, and
don't rely on the surface format to splat things appropriately. So
using RED should work just as well as INTENSITY.
A few notes about the formats:
- R24_UNORM_X8_TYPELESS has the exact same properties as I24X8_UNORM.
- R16_UNORM and R32_FLOAT are additionally supported as a render target,
while the old I16_UNORM/I32_FLOAT formats are not.
- R32_FLOAT_X8X24_TYPELESS is not supported as a render target, while
the old format (R32G32_FLOAT) was. However, it shares the same
properties as the formats we use for Z24, so it should suffice.
This makes translate_tex_format and brw_blorp_surface_info::set
a bit more similar.
No Piglit changes on Sandybridge or Ivybridge. No oglconform changes on
Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Emitting flushes before depth and hiz resolves at the top of blorp's
state emission fixes the hang. Marchesin and I found the fix
experimentally, as opposed to adhering to a documented hardware
workaround. A more minimal fix likely exists, but this gets the job
done.
Fixes HiZ hangs in the new WebGL Google maps on Sandybridge Chrome OS.
Tested by zooming in and out continuously for 2 hours.
This patch is based on
8bc07bb701
CC: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70740
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Broadwell allows us to specify an arbitrary value for QPitch, rather
than baking a specific formula into the hardware and requiring software
to lay things out to match. The only restriction is that the software
provided QPitch needs to be large enough so successive array slices do
not overlap.
In order to support this flexibility, software needs to specify QPitch
in a bunch of packets. Storing QPitch makes that easy, and allows us to
adjust it in a single place should we wish to change it in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Broadwell introduces support for Q, UQ, and HF types. It also extends
DF support to allow immediate values.
Irritatingly, although HF and DF both support immediates, they're
represented by a different value depending on the register file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge, Baytrail, and Haswell support double float register types,
but do not support them as immediate values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On released hardware, values 4-6 are overloaded. For normal registers,
they mean UB/B/DF. But for immediates, they mean UV/VF/V.
Previously, we just created #defines for each name, reusing the same
value. This meant we could directly splat the brw_reg::type field into
the assembly encoding, which was fairly nice, and worked well.
Unfortunately, Broadwell makes this infeasible: the HF and DF types are
represented as different numeric values depending on whether the
source register is an immediate or not.
To preserve sanity, I decided to simply convert BRW_REGISTER_TYPE_* to
an abstract enum that has a unique value for each register type, and
write translation functions. One nice benefit is that we can add
assertions about register files and generations.
I've chosen not to convert brw_reg::type to the enum, since converting
it caused a lot of trouble due to C++ enum rules (even though it's
defined in an extern "C" block...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Three-source instructions use a different encoding for register types
(and have a much more limited set to choose from).
Previously, we translated those into BRW_REGISTER_TYPE_* values, then
reused the existing reg_encoding mapping.
Doing it directly is more straightforward and actually less code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
UB types have never been supported as immediates. On Gen4-5, register
encoding 4 is "Reserved." On Gen6+, it means UV.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Calling the local variables flat_enable and point_sprite_enable is
clearer than dw16 and such. It also matches the names used in
calculate_attr_overrides, which computes them.
v2: Add /* dw16 */ and /* dw10 */ comments, requested by Jordan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
calculate_attr_overrides is responsible for computing the point sprite
and flat-shading enable bitfields. It does so by OR'ing in a bunch of
bits. However, it relied on the caller to set the initial value to
zero. This is pretty fragile - if the caller neglects to zero out those
variables, then the enable bitfields end up full of garbage, which shows
up as random things being flat-shaded.
This patch moves the zero-initialization into calculate_attr_overrides,
so that the computation is completely in one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
git blame ascribes this to the initial commit of the driver.
No released hardware has ever supported half float, according to the
documentation for SrcType in the ISA reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If no function signature is found for a function name, report that the
function is not found instead of printing an empty list of candidates.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch changes the error reporting behavior for incorrect function
invocation (triggered by match_function_by_name() unable to find a
matching function call) from using the line number information
associated to the function name term to using the line number
information of the entire function expression. Fixes bug #72264.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72264
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
For GLSL 1.50 we can get frag shaders with primitive id as an
input, add support to the translator for this.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In load_texunit_bumpmap tc_array is asserted so lets assert
rot_mat_0 and rot_mat_1 also which are coming from same path.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Check for malloc() returning null to fix Klocwork warnings.
Minor clean-ups by BrianP.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RGB32F, GL_RGB32UI and GL_RGB32I texture buffer formats are
only supposed to be allowed if the GL_ARB_texture_buffer_object_rgb32
extension is supported. Note that the texture buffer extensions
require a core profile. This patch adds those checks.
Fixes the soon-to-be-added
arb_clear_buffer_object-negative-bad-internalformat piglit test.
Add function to test if the buffer is already mapped and if so,
if the mapped range overlaps the given range.
Modify the _mesa_InvalidateBufferSubData function to use
the new function.
Enable buffer_object_subdata_range_good() to use bufferobj_range_mapped
Reviewed-by: Brian Paul <brianp@vmware.com>
alpha, lumincance and intensity formats are illegal in a core context.
Add a check to return MESA_FORMAT_NONE if one of those is requested within
a core context.
Reviewed-by: Brian Paul <brianp@vmware.com>
- change storage class from static to extern
- rename validate_texbuffer_format to _mesa_validate_texbuffer_format
Reviewed-by: Brian Paul <brianp@vmware.com>
- add xml file for extension
- add reference in gl_API.xml
- add pointer to device driver function table (dd.h)
- update dispatch_sanity.cpp
Reviewed-by: Brian Paul <brianp@vmware.com>
Specs say it's legal for implementations to use internal copies, and
the write synchronization seems to work. Fixes clCreateBuffer
(together with previous patches) and buffer-flags piglits.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Francisco Jerez <currojerez@riseup.net>
v2: Use fewer if statements and functional tricks instead of single-use method,
suggested by Francisco Jerez.
Squash two small patches into one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When LLVM is build with Clang, "llvm-config --cxxflags" contains the
-fcolor-diagnostics flag. It is not recognized by gcc and the build
fails. Fix by removing the flag.
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
The dri2 state tracker is checking for driver support before enabling
dri2ImageExtension version 7. This commit adds a check that also the
kernel driver supports fd sharing through prime.
Note that this adds a libdrm dependency on dri2.c.
v2: Removed unnecessary clamping of bool expression
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This will avoid compiler warnings in the patch that follows. There
should be no user-visible effect because the change only affects the
behaviour when an invalid enum is passed to
_mesa_shader_type_to_index(), and that can only happen if there is a
bug elsewhere in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
All this cruft was ported from r600g and isn't needed on SI and later
according to hw docs. If we implemented HiS, we would set it to 0.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Replicate some of the gallium pipe transfer functionality.
Also bump minor to signal availability of this feature.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
* These make up the base of what C++ GL Haiku applications
use for 3D rendering.
* Not placed in includes/GL to prevent Haiku headers from
getting installed on non-Haiku systems.
Acked-by: Brian Paul <brianp@vmware.com>
This fixes MSAA resolving for 32-bit integer colorbuffers, which isn't
implemented by the hardware.
It also fixes VM protection faults when resolving MSAA 2D array textures.
This may be a CB bug, because shader-based resolving works fine.
It may also be faster for upside-down and scaled blits.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
For scaled resolve. The filter is only good for magnification.
If somebody has an idea how to implement a good filter for minification,
I'm all ears. I'd have to use derivatives probably.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need this for integer formats and upside-down blits, which Radeons don't
support for MSAA resolving.
It can be used by calling util_blitter_blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
The argument is a i8 pointer not a i32 pointer (even though the value actually
stored/loaded IS i32). Older llvm versions didn't care but 3.2 and newer do
leading to crashes.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Didn't really work as well as hoped (in particular it was not generally
more accurate), will solve this differently.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This code was always problematic, and with 64bit rasterization we no longer
need it at all.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The target parameter to _mesa_get_tex_image() is a target enum, not an index.
When we're setting up faces for a cubemap, it should be
CUBE_MAP_POSITIVE_X .. CUBE_MAP_NEGATIVE_Z; for all other targets it
should be the same as the texobj's target.
Fixes broken cubemaps [had only +X face but claimed to have all] produced by
glTextureView, which then caused various crashes in the driver when we
tried to use them.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: - add assert so we don't run into trouble on Gen6.
- adjust for Tapani's rearrangement of ir_variable
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The GL_ARB_shadow spec says the shadow compare mode should have no
effect when sampling a color texture. As it was, it was up to
drivers to check for that (softpipe, llvmpipe, svga and probably
the rest don't do that). Note: it looks like DX10 allows shadow
compare with some non-depth formats, so this case really should be
handled in the state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Depending on the depth texture format, we may or may not have to
emit explicit fs code to do the shadow comparison. Before, we
were emitting it more often than needed.
v2: check the actual texture format rather than the screen->depth.z16
field. The screen->depth.z16, x8z24, s8z24 fields may not all be set
to a consistent set of depth formats.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Call TextureView helper function to set TextureView state
appropriately for the TexStorage calls.
Misc updates from review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add helper function to set texture_view state from TexStorage calls.
Include review feedback.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add Mesa TextureView logic.
Incorporate feedback on ARB_texture_view:
- Add S3TC VIEW_CLASSes to compatibility table
- Use existing _mesa_get_tex_image
- Clean up error strings
- Use bool instead of GLboolean for internal functions
- Split compound level & layer test into individual tests
- eliminate helper macro for VIEW_CLASS table
- do not call driver if ptr null.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Refactor to make next_mipmap_level_size defined in mipmap.c a
_mesa_ helper function that can then be used by texture_view
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add support for ARB_texture_view get parameters:
GL_TEXTURE_VIEW_MIN_LEVEL
GL_TEXTURE_VIEW_NUM_LEVELS
GL_TEXTURE_VIEW_MIN_LAYER
GL_TEXTURE_VIEW_NUM_LAYERS
Incorporate feedback regarding when to allow query of
GL_TEXTURE_IMMUTABLE_LEVELS.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Add state needed by glTextureView to the gl_texture_object.
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Stub in glTextureView API call to go with the
glTextureView API xml definition.
Includes dispatch test for glTextureView
Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This patch changes the error condition to satisfy below statement
from OpenGL 4.3 core specification:
"An INVALID_OPERATION error is generated if id is the name of a query
object with a target other SAMPLES_PASSED, ANY_SAMPLES_PASSED, or
ANY_SAMPLES_PASSED_CONSERVATIVE, or if id is the name of a query
currently in progress."
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm not sure why this change is necessary. When I've built previous tar files
(such as 9.2.4) with the "make tarballs" target, they include the
bin/test-driver file. But at my first attempt to build the tar files for the
10.0.1 release this file was not being included and the build failed.
(cherry picked from commit d573899b93)
[The cherry pick is because I original applied this on the 10.0 branch while
working on the 10.0.1 release. But if we don't have this on master as well,
this issue will trip us up again the next time we make a new major-release
branch off of master.]
The driverPrivate pointer is opaque to the driver and we can't assume
it's a struct gl_context in dri_util.c. Instead provide a helper function
to set the struct gl_context flags from the incoming DRI context flags.
v2 (idr): Modify the other classic drivers to also use
driContextSetFlags. I ran all the piglit GLX_ARB_create_context tests
with i965 and classic swrast without regressions.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu> [v1 on Gallium nouveau]
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patches add MESA_copy_sub_buffer support to the dri sw loader and
then to gallium state tracker, llvmpipe, softpipe and other bits.
It reuses the dri1 driver extension interface, and it updates the swrast
loader interface for a new putimage which can take a stride.
I've tested this with gnome-shell with a cogl hacked to reenable sub copies
for llvmpipe and the one piglit test.
I could probably split this patch up as well.
v2: pass a pipe_box, to reduce the entrypoints, as per Jose's review,
add to p_screen doc comments.
v3: finish off winsys interfaces, add swrast classic support as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
swrast: add support for copy_sub_buffer
Required for glClearBuffer, which only clears one colorbuffer attachment.
Example:
If the first colorbuffer is float and the second one is int:
pipe->clear(pipe, PIPE_CLEAR_COLOR0, float_clear_color, ...);
pipe->clear(pipe, PIPE_CLEAR_COLOR1, int_clear_color, ...);
This doesn't need any driver changes yet, because all drivers just use:
if (flags & PIPE_CLEAR_COLOR) ..
The drivers which support GL 3.0 will have to implement it properly though.
This adds 2 optimizations for radeonsi:
- handling of DISCARD_RANGE
- mapping an uninitialized buffer range is automatically UNSYNCHRONIZED
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Christoph Brill <egore911@gmail.com>
v2: Renamed r600_buffer.c to r600_buffer_common.c. The stupid build system
doesn't allow 2 files of the same name in different directories.
If we assume that all buffers allocated by the DDX are scanout, a new flag
that says "this is not scanout" has to be added to support the non-scanout
buffers and maintain backward compatibility.
This fixes bad rendering on Wayland.
The flag is defined as:
#define RADEON_TILING_R600_NO_SCANOUT RADEON_TILING_SWAP_16BIT
AFAIK, RADEON_TILING_SWAP_16BIT is not used on SI.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Patch copies the whole data structure at once instead of
assigning individual variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields and variables to the data
structure:
explicit_location, explicit_index, explicit_binding, has_initializer,
is_unmatched_generic_inout, location_frac, from_named_ifc_block_nonarray,
from_named_ifc_block_array, depth_layout, location, index, binding,
max_array_access, atomic
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This patch moves following bitfields in to the data structure:
used, assigned, how_declared, mode, interpolation,
origin_upper_left, pixel_center_integer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Data section helps serialization and cloning of a ir_variable. This
patch includes the helper bits used for read only ir_variables.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Newer virtual HW versions support smooth/stipple/wide lines.
Use that instead of 'draw' fallbacks when possible.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
With this patch llvmpipe will adhere to the ARB_depth_clamp enabled state when
clamping the fragment's zw value. To support this, the variant key now includes
the depth_clamp state. key->depth_clamp is derived from pipe_rasterizer_state's
(depth_clip == 0), thus depth clamp is only enabled when depth clip is disabled.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On evergreen we have to reserve 1 stack element in some additional cases
besides the ones mentioned in the docs, but stack size computation was
recently reimplemented exactly as described in the docs by the patch that
added workarounds for stack issues on EG/CM, resulting in regressions
with some apps (Serious Sam 3).
This patch fixes it by restoring previous behavior.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=72369
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Andre Heider <a.heider@gmail.com>
Caching in the vbuf module meant that once a vertex has been
emitted it was cached, but it's possible for a vertex at the
same location to be emitted again, but this time with a different
front-face semantic. Caching was causing the first version of the
vertex to be emitted, which resulted in the renderer getting
incorrect front-face attributes. By reseting the vertex_id (which
is used for caching) we make sure that once a front-face info
has been injected the vertex will endup getting emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The fact that we flush denorms to zero breaks our half-float
conversion and blending. This patches enables denorms for
blending. It's a little tricky due to the llvm bug that makes
it incorrectly reorder the mxcsr intrinsics:
http://llvm.org/bugs/show_bug.cgi?id=6393
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Zack Rusin <zackr@vmware.com>
v2: Fix up queryImage return for ATTRIB_FD
Use driver_descriptor.configuration to determine whether the driver
supports DMA-BUF import/export.
v3: Really, truly, fix up queryImage return for ATTRIB_FD
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
First off, nv50_program only has 16 in/out varyings. However reporting
16 makes 'm' become 68 in nv50_fp_linkage_validate with the
varying-packing-simple piglit test. (Subverting the assert makes it
compile but fail.) With this patch, varying-packing-simple passes.
See: https://bugs.freedesktop.org/show_bug.cgi?id=69155
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
To help the transition period when DRI loaders are being updated
to support the newer __driDriverExtensions_foo mechanism,
we populate __driDriverExtensions with the extensions returned
by __driDriverExtensions_foo during a library contructor
function.
We find the driver foo's name by using the dladdr function
which gives the path of the dynamic library's name that
was being loaded.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Keith Packard <keithp@keithp.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
The _EGLSurface struct which is embedded into dri2_egl_surface also contains a
swap interval member so the other member is redundant. Nothing was using it as
far as I can tell.
On Gen4+, OUT_RELOC_FENCED is equivalent to OUT_RELOC; libdrm silently
ignores the fenced flag:
/* We never use HW fences for rendering on 965+ */
if (bufmgr_gem->gen >= 4)
need_fence = false;
Thanks to Eric for noticing this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that loop_controls no longer creates normatively bound loops,
there is no need for ir_loop::normative_bound or the
lower_bounded_loops pass.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when loop_controls analyzed a loop and found that it had a
fixed bound (known at compile time), it would remove all of the loop
terminators and instead set the loop's normative_bound field to force
the loop to execute the correct number of times.
This made loop unrolling easy, but it had a serious disadvantage.
Since most GPU's don't have a native mechanism for executing a loop a
fixed number of times, in order to implement the normative bound, the
back-ends would have to synthesize a new loop induction variable. As
a result, many loops wound up having two induction variables instead
of one. This caused extra register pressure and unnecessary
instructions.
This patch modifies loop_controls so that it doesn't set the loop's
normative_bound anymore. Instead it leaves one of the terminators in
the loop (the limiting terminator), so the back-end doesn't have to go
to any extra work to ensure the loop terminates at the right time.
This complicates loop unrolling slightly: when deciding whether a loop
can be unrolled, we have to account for the presence of the limiting
terminator. And when we do unroll the loop, we have to remove the
limiting terminator first.
For an example of how this results in more efficient back end code,
consider the loop:
for (int i = 0; i < 100; i++) {
total += i;
}
Previous to this patch, on i965, this loop would compile down to this
(vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
(notice that both g8 and g4 are loop induction variables; one is used
to terminate the loop, and the other is used to accumulate the total).
After this patch, the same loop compiles to:
mov(8) g4<1>.xD 0D
loop:
cmp.ge.f0(8) null g4<4;4,1>.xD 100D
(+f0) if(8)
break(8)
endif(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This value is now redundant with
loop_variable_state::limiting_terminator->iterations and
ir_loop::normative_bound.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The old logic of loop_unroll_visitor::visit_leave(ir_loop *) was:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
else if (loop contains one jump) {
if (the jump is an unconditional "break" at the end of the loop) {
remove the break and set iteration count to 1;
fall through to simple loop unrolling code;
} else {
for (each "if" statement in the loop body)
see if the jump is a "break" at the end of one of its forks;
if (the "break" wasn't found)
return;
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unrolling code;
return;
}
}
simple loop unrolling code;
return;
These tasks have been moved to their own functions:
- splice the remainder of the loop into the other fork of the "if"
- simple loop unrolling code
- complex loop unrolling code
And the logic has been flattened to:
heuristics to skip unrolling in various circumstances;
if (loop contains more than one jump)
return;
if (loop contains no jumps) {
simple loop unroll;
return;
}
if (the jump is an unconditional "break" at the end of the loop) {
remove the break;
simple loop unroll with iteration count of 1;
return;
}
for (each "if" statement in the loop body) {
if (the jump is a "break" at the end of one of its forks) {
splice the remainder of the loop into the other fork of the "if";
remove the "break";
complex loop unroll;
return;
}
}
This will make it easier to modify the loop unrolling algorithm in a
future patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, the sole responsibility of loop_analysis was to find all
the variables referenced in the loop that are either loop constant or
induction variables, and find all of the simple if statements that
might terminate the loop. The remainder of the analysis necessary to
determine how many times a loop executed was performed by
loop_controls.
This patch makes loop_analysis also responsible for determining the
number of iterations after which each loop terminator will terminate
the loop, and for figuring out which terminator will terminate the
loop first (I'm calling this the "limiting terminator").
This will allow loop unrolling to make use of information that was
previously only visible from loop_controls, namely the identity of the
limiting terminator.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patches to follow will introduce code into the loop_terminator
constructor. Allocating loop_terminator using new(mem_ctx) syntax
will ensure that the constructor runs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When loop_control_visitor::visit_leave(ir_loop *) is analyzing a loop
terminator that acts on a certain ir_variable, it doesn't need to walk
the list of induction variables to find the loop_variable entry
corresponding to the variable. It can just look it up in the
loop_variable_state hashtable and verify that the loop_variable entry
represents an induction variable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These fields were part of some planned optimizations that never
materialized. Remove them for now to simplify things; if we ever get
round to adding the optimizations that would require them, we can
always re-introduce them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch replaces the ir_loop fields "from", "to", "increment",
"counter", and "cmp" with a single integer ("normative_bound") that
serves the same purpose.
I've used the name "normative_bound" to emphasize the fact that the
back-end is required to emit code to prevent the loop from running
more than normative_bound times. (By contrast, an "informative" bound
would be a bound that is informational only).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the
i965 fs and vec4 visitors) had nearly identical logic for handling
bounded loops. This replaces the duplicate logic with an equivalent
lowering pass that is used by all the back-ends.
Note: on i965, there is a slight increase in instruction count. For
example, a loop like this:
for (int i = 0; i < 100; i++) {
total += i;
}
would previously compile down to this (vec4) native code:
mov(8) g4<1>.xD 0D
mov(8) g8<1>.xD 0D
loop:
cmp.ge.f0(8) null g8<4;4,1>.xD 100D
(+f0) break(8)
add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD
add(8) g8<1>.xD g8<4;4,1>.xD 1D
add(8) g4<1>.xD g4<4;4,1>.xD 1D
while(8) loop
After this patch, the "(+f0) break(8)" turns into:
(+f0) if(8)
break(8)
endif(8)
because the back-end isn't smart enough to recognize that "if
(condition) break;" can be done using a conditional break instruction.
However, it should be relatively easy for a future peephole
optimization to properly optimize this.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, loop analysis would set
this->conditional_or_nested_assignment based on the most recently
visited assignment to the variable. As a result, if a vaiable was
assigned to more than once in a loop, the flag might be set
incorrectly. For example, in a loop like this:
int x;
for (int i = 0; i < 3; i++) {
if (i == 0)
x = 10;
...
x = 20;
...
}
loop analysis would have incorrectly concluded that all assignments to
x were unconditional.
In practice this was a benign bug, because
conditional_or_nested_assignment is only used to disqualify variables
from being considered as loop induction variables or loop constant
variables, and having multiple assignments also disqualifies a
variable from being considered as either of those things.
Still, we should get the analysis correct to avoid future confusion.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting an ir_call, loop analysis would only mark
the innermost enclosing loop as containing a call. As a result, when
encountering a loop like this:
for (i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
foo();
}
}
it would incorrectly conclude that the outer loop ran three times.
(This is not certain; if foo() modifies i, then the outer loop might
run more or fewer times).
Fixes piglit test "vs-call-in-nested-loop.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when visiting a variable dereference, loop analysis would
only consider its effect on the innermost enclosing loop. As a
result, when encountering a loop like this:
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
...
i = 2;
}
}
it would incorrectly conclude that the outer loop ran three times.
Fixes piglit test "vs-inner-loop-modifies-outer-loop-var.shader_test".
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fast color clears of MSAA buffers work just like fast color clears
with non-MSAA buffers, except that the alignment and scaledown
requirements are different.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This will make it easier to add fast color clear support to MSAA
buffers, since they have different alignment and scaling requirements.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we didn't do multisample blorp clears because we couldn't
figure out how to get them to work. The reason for this was because
we weren't setting the brw_blorp_params num_samples field consistently
with dst.num_samples. Now that those two fields have been collapsed
down into one, we can do multisample blorp clears.
However, we need to do a few other pieces of bookkeeping to make them
work correctly in all circumstances:
- Since blorp clears may now operate on multisampled window system
framebuffers, they need to call
intel_renderbuffer_set_needs_downsample() to ensure that a
downsample happens before buffer swap (or glReadPixels()).
- When clearing a layered multisample buffer attachment using UMS or
CMS layout, we need to advance layer by multiples of num_samples
(since each logical layer is associated with num_samples physical
layers).
Note: we still don't do multisample fast color clears; more work needs
to be done to enable those.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_blorp_params contained two fields for determining
sample count: num_samples (which determined the multisample
configuration of the rendering pipeline) and dst.num_samples (which
determined the multisample configuration of the render target
surface). This was redundant, since both fields had to be set to the
same value to avoid rendering errors.
This patch eliminates num_samples to avoid future confusion.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch renames the enum that's used to keep track of fast clear
state from "mcs_state" to "fast_clear_state", and it removes the enum
value INTEL_MCS_STATE_MSAA (which previously meant, "this is an MSAA
buffer, so we're not keeping track of fast clear state"). The only
real purpose that enum value was serving was to prevent us from trying
to do fast clear resolves on MSAA buffers, and it's just as easy to
prevent that by checking the buffer's msaa_layout.
This paves the way for implementing fast clears of MSAA buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The "layer" parameters used in blorp, and the
intel_renderbuffer::mt_layer field, represent a physical layer rather
than a logical layer. This is important for 2D multisample arrays on
Gen7+ because the UMS and CMS multisample layouts use N physical
layers to represent each logical layer, where N is the number of
samples.
Also add an assertion to blorp to help catch bugs if we fail to follow
these conventions.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Clarify the fact that we only optimize full buffer clears using fast
color clear, and why.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Create the ref_bo without any storage type flags set for now. The issue
probably arises from our use of the additional buffer space at the end
of the ref_bo. It should probably be split up in the future.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Martin Peres <martin.peres@labri.fr>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
With this patch, generate_fs_loop will clamp any fragment shader depth writes
to the viewport's min and max depth values. Viewport selection is determined
by the geometry shader output for the viewport array index. If no index is
specified, then the default viewport index is zero. Semantics for this path
can be found in draw_clamp_viewport_idx and lp_clamp_viewport_idx.
lp_jit_viewport was created to store viewport information visible to JIT code,
and is validated when the LP_NEW_VIEWPORT dirty flag is set.
lp_rast_shader_inputs is responsible for passing the viewport_index through
the rasterizer stage to fragment stage (via lp_jit_thread_data).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The Wayland EGL platform now respects the eglSwapInterval value. The value is
clamped to either 0 or 1 because it is difficult (and probably not useful) to
sync to more than 1 redraw.
The main change is that if the swap interval is 0 then Mesa won't install a
frame callback so that eglSwapBuffers can be executed as often as necessary.
Instead it will do a sync request after the swap buffers. It will block for
sync complete event in get_back_bo instead of the frame callback. The
compositor is likely to send a release event while processing the new buffer
attach and this makes sure we will receive that before deciding whether to
allocate a new buffer.
If there are no buffers available then instead of returning with an error,
get_back_bo will now poll the compositor by repeatedly sending sync requests
every 10ms. This is a last resort and in theory this shouldn't happen because
there should be no reason for the compositor to hold on to more than three
buffers. That means whenever we attach the fourth buffer we should always get
an immediate release event which should come in with the notification for the
first sync request that we are throttled to.
When the compositor is directly scanning out from the application's buffer it
may end up holding on to three buffers. These are the one that is is currently
scanning out from, one that has been given to DRM as the next buffer to flip
to, and one that has been attached and will be given to DRM as soon as the
previous flip completes. When we attach a fourth buffer to the compositor it
should replace that third buffer so we should get a release event immediately
after that. This patch therefore also changes the number of buffer slots to 4
so that we can accomodate that situation.
If DRM eventually gets a way to cancel a pending page flip then the compositors
can be changed to only need to hold on to two buffers and this value can be
put back to 3.
This also moves the vblank configuration defines from platform_x11.c to the
common egl_dri2.h header so they can be shared by both platforms.
Consider a typical game-style main loop which might be like this:
while (1) {
draw_something();
eglSwapBuffers();
}
In this case the game is relying on eglSwapBuffers to throttle to a sensible
frame rate. Previously this game would end up using three buffers even though
it should only need two. This is because Mesa decides whether to allocate a
new buffer in get_back_bo which would be before it has tried to read any
events from the compositor so it wouldn't have seen any buffer release events
yet.
This patch just moves the block for the frame callback to get_back_bo.
Typically the compositor will send a release event immediately after one of
the attaches so if we block for the frame callback here then we can be sure to
have completed at least one roundtrip and received that release event after
attaching the previous buffer before deciding whether to allocate a new one.
dri2_swap_buffers always calls get_back_bo so even if the client doesn't
render anything we will still be sure to block to the frame callback. The code
to create the new frame callback has been moved to after this call so that we
can be sure to have cleared the previous frame callback before requesting a
new one.
This patch fixes this MinGW build error.
CC glapi_gentable.lo
glapi_gentable.c:47:19: fatal error: dlfcn.h: No such file or directory
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Drivers will need to look at this to decide if they need to do
per-sample fragment shader dispatch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
My understanding is that Broadwell retains the same SCS mechanism
that Haswell has, so even if the underlying issue with this format
is not fixed, the w/a will be applied in SCS rather than needing
shader code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the pieces are in place, this should provide
a nice performance boost for apps using multisample textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to emit extra shader code in this case to sample the
MCS surface first; we can't just blindly do this all the time
since IVB will sometimes try to access the MCS surface even if
disabled.
V3: Use actual MSAA layout from the texture's mt, rather
then computing what would have been used based on the format.
This is simpler and less fragile - there's at least one case where
we might want to have a texture's MSAA layout change based on what
the app does (CMS SINT falling back to UMS if the app ever attempts
to render to it with a channel disabled.)
This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain
an implementation detail of the miptree code.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This gives us correct behavior for both renderbuffers (which previously
worked) and multisample textures (which would never get an MCS surface
allocated, even if CMS layout was selected)
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The bspec says:
"SW must program the sample mask value in this field so that it matches
with 3DSTATE_SAMPLE_MASK"
I haven't observed this to actually fix anything, but stumbled across it
while adding the rest of the support for CMS layout for multisample
textures.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For some reason this was left out when the version was changed...
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Before this patch, the following code would not be optimized even though
the first two instructions were common to the then and else blocks:
(+f0) IF
MOV dst0 ...
MOV dst1 ...
MOV dst2 ...
ELSE
MOV dst0 ...
MOV dst1 ...
MOV dst3 ...
ENDIF
This commit extends the peephole to handle this case.
No shader-db changes.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
fs_visitor::try_replace_with_sel optimizes only if statements whose
"then" and "else" bodies contain a single MOV instruction. It also
could not handle constant arguments, since they cause an extra MOV
immediate to be generated (since we haven't run constant propagation,
there are more than the single MOV).
This peephole fixes both of these and operates as a normal optimization
pass.
fs_visitor::try_replace_with_sel is still arguably necessary, since it
runs before pull constant loads are lowered.
total instructions in shared programs: 1559129 -> 1545833 (-0.85%)
instructions in affected programs: 167120 -> 153824 (-7.96%)
GAINED: 13
LOST: 6
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the last thing that register_coalesce() still handled.
total instructions in shared programs: 1561060 -> 1560908 (-0.01%)
instructions in affected programs: 15758 -> 15606 (-0.96%)
Reviewed-by: Eric Anholt <eric@anholt.net>
parent_mem_ctx was unused since db47074a, so remove the two wrappers
around create() and make create() the constructor.
Reviewed-by: Eric Anholt <eric@anholt.net>
make_list is just a one-line wrapper and was confusingly called by
NULL objects. E.g., cur_if == NULL; cur_if->make_list(mem_ctx).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we made the basic block following an ENDIF instruction a
successor of the basic blocks ending with IF and ELSE. The PRM says that
IF and ELSE instructions jump *to* the ENDIF, rather than over it.
This should be immaterial to dataflow analysis, except for if, break,
endif sequences:
START B1 <-B0 <-B9
0x00000100: cmp.g.f0(8) null g15<8,8,1>F g4<0,1,0>F
0x00000110: (+f0) if(8) 0 0 null 0x00000000UD
END B1 ->B2 ->B4
START B2 <-B1
break
0x00000120: break(8) 0 0 null 0D
END B2 ->B10
START B3
0x00000130: endif(8) 2 null 0x00000002UD
END B3 ->B4
The ENDIF block would have no parents, so dataflow analysis would
generate incorrect results, preventing copy propagation from eliminating
some instructions.
This patch changes the CFG to make ENDIF start rather than end basic
blocks, so that it can be the jump target of the IF and ELSE
instructions.
It helps three programs (including two fs8/fs16 pairs).
total instructions in shared programs: 1561126 -> 1561060 (-0.00%)
instructions in affected programs: 837 -> 771 (-7.89%)
More importantly, it allows copy propagation to handle more cases.
Disabling the register_coalesce() pass before this patch hurts 58
programs, while afterward it only hurts 11 programs.
Reviewed-by: Eric Anholt <eric@anholt.net>
This extension enabled the use of texture array with fixed-function and
assembly fragment shaders. No applications are known to use this
extension.
NOTE: This patch regresses GL_TEXTURE_1D_ARRAY and GL_TEXTURE_2D_ARRAY
cases of the copyteximage piglit test. The test is incorrectly using
texture arrays with fixed function while only requiring the
GL_EXT_texture_array extension. A fix for the test has been posted to
the piglit mailing list.
http://lists.freedesktop.org/archives/piglit/2013-November/008639.html
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Every driver that enables one also enables the other. The difference
between the two is MESA adds support for fixed-function and assembly
fragment shaders, but EXT only adds support for GLSL. The MESA
extension was created back when Mesa did not support GLSL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function 'count_tex_size':
main/texobj.c:886:23: warning: unused parameter 'key' [-Wunused-parameter]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
main/texobj.c: In function '_mesa_test_texobj_completeness':
main/texobj.c:553:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:193: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
main/texobj.c:553:254: warning: signed and unsigned type in conditional expression [-Wsign-compare]
main/texobj.c:553:148: warning: signed and unsigned type in conditional expression [-Wsign-compare]
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
That enum requires GL_ARB_texture_cube_map_array, and it is only
available on desktop GL. It looks like this has been an un-noticed
issue since GL_ARB_texture_cube_map_array support was added in commit
e0e7e295.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds an extension called EGL_WL_create_wayland_buffer_from_image
which adds the following single function:
struct wl_buffer *
eglCreateWaylandBufferFromImageWL(EGLDisplay dpy, EGLImageKHR image);
The function creates a wl_buffer which shares its contents with the given
EGLImage. The expected use case for this is in a nested Wayland compositor
which is using subsurfaces to present buffers from its clients. Using this
extension it can attach the client buffers directly to the subsurface without
having to blit the contents into an intermediate buffer. The compositing can
then be done in the parent compositor.
The extension is only implemented in the Wayland EGL platform because of
course it wouldn't make sense anywhere else.
If we're not using EGL_EXT_swap_buffers_with_damage, we have to
damage the full extent. EGL operates on buffer coordinates, but
wl_surface.damage takes surface coordinates. EGL doesn't know the
buffer transformation (rotated or scaled) and can't post accurate
damage in surface coordinates. The damage event however is clipped to
the surface extents so we can just damage the maximum rectangle.
In case of EGL_EXT_swap_buffers_with_damage, the application knows
the buffer transform and is expected to pass in rectangles in
surface space.
https://bugs.freedesktop.org/show_bug.cgi?id=70250
Cc: "10.0" mesa-stable@lists.freedesktop.org
flush_with_flags, when available, allows the driver to throttle.
Using this suppress input lag issues that can be observed in heavy
rendering situations on non-intel cards.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
This typically won't make a difference, since we only send the requests at
wl_display_flush() time. There might be a small race
with another thread calling wl_display_flush() after our commit request,
but before we flush the DRI driver. Moving the commit below the DRI
driver flush call looks more natural and eliminates the small race.
Cc: "10.0" mesa-stable@lists.freedesktop.org
We would like the compositor to receive the commited buffer
as soon as possible, so it has the time to treat it, and
release old ones. We shouldn't rely on the client
to flush the queue for us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Display lists allocate memory in chunks of 256 tokens (1KB) at a time.
If an app creates many short display lists or uses glXUseXFont() this
can waste quite a bit of memory.
This patch uses realloc() to trim short lists and reduce the memory
used.
Also, null/zero-out some list construction fields in _mesa_EndList().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now, sizeof(gl_dlist_node)==4 even on 64-bit systems. This can
halve the memory used by some display lists on 64-bit systems.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a first step in reducing memory used by display lists on
64-bit systems. On 64-bit systems, the gl_dlist_node union type
is 8 bytes because of the 'data' and 'next' fields. This causes
every display list node/token to occupy 8 bytes instead of 4 as
originally designed. This basically doubles the memory used by
some display lists on 64-bit systems.
The fix is to remove the 64-bit 'data' and 'next' pointer fields
from the union and instead store them as a pair of 32-bit values.
Easily done with a few helper functions.
The next patch will take care of the 'next' field.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This fixes a memory leak in some situations. Also avoids emitting an
extra fence if the kick handler does the call to nouveau_fence_next
itself.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "9.2 10.0" <mesa-stable@lists.freedesktop.org>
So that it acts like ordinary free(). This lets us remove a bunch of
if statements where the function is called.
v2:
- Avoiding compile error on MSVC and possible warnings on other compilers.
- Added comment regards passing NULL pointer being safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
I missed this in the boolean -> enum conversion. C cheerfully casts
false -> 0 -> UNKNOWN_RING. On Gen4-5, this causes the render ring
prelude hook to get called in the middle of the batch, which is crazy.
BRW_BATCH_STRUCT is not used on Gen6+.
Fixes regressions since 395a32717d
("i965: Introduce an UNKNOWN_RING state.").
Fixes "fips -v glxgears" on Ironlake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit a4bf7f6b6e.
It breaks occlusion queries on Gen4-5. Doing this right will likely
require larger changes, which should be done at a future date.
Some Piglit tests still passed due to other bugs; fixing those revealed
this problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I guarded half of the callers to start/stop_oa_counters with generation
checks, but missed the other half (which were added later). OACONTROL
doesn't exist on Ironlake, so we better not write it. Also, there's no
need---Ironlake's performance counters are always running.
This patch moves the generation checks into start/stop_oa_counters,
rather than requiring the caller to do them.
Fixes assertion failures in Piglit's AMD_performance_monitor/measure.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
One can select if they want to fallback to softpipe.
Current approach makes this not possible, whereas other
targets (dri-swrast) handle this approapriately.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The BSpec states that the aligment for the non-msrt clear rectangle must
be doubled; the BSpec does not restricit the workaround to specific
hardware.
Commit 9a1a67b applied the workaround to Haswell GT3. Commit 8b659ce
expanded the workaround to all Haswell variants. This commit expands it
to all hardware.
No Piglit regressions on Ivybridge 0x0166. No fixes either.
I know no Ivybridge nor Baytrail bug related to this workaround.
However, the BSpec says the extra alignment is required, so let's do it.
v2: Apply to all hardware, not just gen7.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
All bound layers (from first_layer to last_layer) should be cleared.
This uses a vertex shader which outputs gl_Layer = gl_InstanceID, so each
instance goes to a different layer. By rendering a quad and setting
the instance count to the number of layers, it will trivially clear all
layers.
This requires AMD_vertex_shader_layer (or PIPE_CAP_TGSI_VS_LAYER), which only
radeonsi supports at the moment. r600 could do this too. Standard DX11
hardware will have to use a geometry shader though, which has higher overhead.
This is a subset of geometry shaders. It's all about setting first_layer and
last_layer correctly.
Also some code between st_render_texture and update_framebuffer_state is
consolidated. It doesn't use rtt_level and derives the level from dimensions
instead as the code in st_atom_framebuffer.c did.
Commit a594cec broke EGL X11 backend by adding dependency between
X11 and DRM backends requiring HAVE_EGL_PLATFORM_DRM defined for X11.
This patch fixes the issue by adding additional define for libdrm
detection independent of which backend is being compiled. Tested by
compiling Mesa with '--with-egl-platforms=x11' and running es2gears_x11
+ glbenchmark2.7 successfully.
v2: return true for dri2_auth if running without libdrm (Samuel)
v3: check libdrm when building EGL drm platform + AM_CFLAGS fix (Emil)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72062
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: mesa-stable@lists.freedesktop.org
MI_STORE_REGISTER_MEM has to take a 48-bit address, so the existing code
doesn't work. But supposedly Broadwell has a register whitelist and
just works out of the box anyway, so there's no need to check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Gen7 sampler state code still works. Increasing the alignment to
64 bytes makes bit 5 zero, which is good because it's now reserved.
Since we don't use the new filter bits, we can leave those as zero too,
which means we don't need to update the code to update the pointer.
(We probably should anyway, for clarity, but alas, another day.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation is really hard to follow, but apparently a 32-bit x
32-bit multiply just works without the MACH macro. The macro apparently
is only necessary to get the full 64-bit value.
Fixes Piglit tests [vf]s-op-mult-int-int.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like Haswell, we do this in SURFACE_STATE rather than shader
workarounds.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Broadwell doesn't support a surface vertical alignment of 2. It only
supports VALIGN_4, VALIGN_8, or VALIGN_16. I chose 4 since it's the
least wasteful.
v2: Replace my comment with a better one from Eric. Move Broadwell
checks earlier so it's more obvious that "return 2" won't be hit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to SEND from a GRF, and we can only obtain those prior to
register allocation.
This allows us to do pull constant loads without the MRF hack.
v2: Reword comments (suggested by Paul).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
SEND can't deal with swizzles, source modifiers, and so on. This should
avoid problems with VS pull constant loads on Broadwell.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Pre-patch, the workaround was applied to only HSW GT3. However, the
workaround also fixes render corruption on the HSW GT1 Chromebook,
codenamed Falco.
Also, update the BSpec quote that discusses the workaround to reflect
the latest BSpec.
The BSpec states that the workaround is required for Ivybridge and
Baytrail as well as Haswell. But, we apply the workaround to only
Haswell because (a) we suspect that is the only hardware where it is
actually required and (b) we haven't yet validated the workaround for
the other hardware.
CC: "9.2, 10.0" <mesa-stable@lists.freedesktop.org>
CC: Anuj Phogat <anuj.phogat@gmail.com>
OTC-Tracker: CHRMOS-812
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Previously, we stored an array of up to 16 additional shaders to link,
as well as a count of how many each shader actually needed.
Since the built-in functions rewrite, all the built-ins are stored in a
single shader. So all we need is a boolean indicating whether a shader
needs to link against built-ins or not.
During linking, we can avoid creating the temporary array if none of the
shaders being linked need built-ins. Otherwise, it's simply a copy of
the array that has one additional element. This is much simpler.
This patch saves approximately 128 bytes of memory per gl_shader object.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since the built-in functions rewrite, num_builtins_to_link is always either
0 or 1, so we don't need tho crazy loop starting at -1 with a special
case.
All we need to do is print the prototypes from the current shader, and
the single built-in function shader (if present).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when we hit a "no matching function" error, it looked like:
0:0(0): error: no matching function for call to `cos()'
0:0(0): error: candidates are: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
Now it looks like:
0:0(0): error: no matching function for call to `cos()'; candidates are:
0:0(0): error: float cos(float)
0:0(0): error: vec2 cos(vec2)
0:0(0): error: vec3 cos(vec3)
0:0(0): error: vec4 cos(vec4)
This is not really any worse and removes the need for the prefix variable.
It will also help with the next commit's refactoring.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no need to loop through the "parameters" list and remove every
element; move_nodes_to(¶meters) already throws away all elements of
the destination list.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
make[5]: Entering directory `/jhbuild/build/mesa/mesa/src/mapi/glapi/tests'
CXX check_table.o
/jhbuild/checkout/mesa/mesa/src/mapi/glapi/tests/check_table.cpp:29:30: fatal error: glapi/glapitable.h: No such file or directory
We should look for the generated file glapi/glapitable.h in builddir, not srcdir
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
On gen6, multisamble resolve blits use the SAMPLE message to blend
together the 4 samples for each texel. For some reason, SAMPLE
doesn't blend together the proper samples when the source format is
L32_FLOAT or I32_FLOAT, resulting in blocky artifacts.
To work around this problem, sample from the source surface using
R32_FLOAT. This shouldn't affect rendering correctness, because when
doing these resolve blits, the destination format is R32_FLOAT, so the
channel replication done by L32_FLOAT and I32_FLOAT is unnecessary.
Fixes piglit tests on Sandy Bridge:
- spec/ARB_texture_float/multisample-formats 2 GL_ARB_texture_float
- spec/ARB_texture_float/multisample-formats 4 GL_ARB_texture_float
No piglit regressions on Sandy Bridge.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70601
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) have been assuming this
for some time. Thanks to the preceding patch, the compiler front-end
no longer breaks this assumption.
This patch adds code to validate the assumption so that if we have
future bugs, we'll be able to catch them earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The compiler back-ends (i965's fs_visitor and brw_visitor,
ir_to_mesa_visitor, and glsl_to_tgsi_visitor) assume that when
ir_loop::counter is non-null, it points to a fresh ir_variable that
should be used as the loop counter (as opposed to an ir_variable that
exists elsewhere in the instruction stream).
However, previous to this patch:
(1) loop_control_visitor did not create a new variable for
ir_loop::counter; instead it re-used the existing ir_variable.
This caused the loop counter to be double-incremented (once
explicitly by the body of the loop, and once implicitly by
ir_loop::increment).
(2) ir_clone did not clone ir_loop::counter properly, resulting in the
cloned ir_loop pointing to the source ir_loop's counter.
(3) ir_hierarchical_visitor did not visit ir_loop::counter, resulting
in the ir_variable being missed by reparenting.
Additionally, most optimization passes (e.g. loop unrolling) assume
that the variable mentioned by ir_loop::counter is not accessed in the
body of the loop (an assumption which (1) violates).
The combination of these factors caused a perfect storm in which the
code worked properly nearly all of the time: for loops that got
unrolled, (1) would introduce a double-increment, but loop unrolling
would fail to notice it (since it assumes that ir_loop::counter is not
accessed in the body of the loop), so it would unroll the loop the
correct number of times. For loops that didn't get unrolled, (1)
would introduce a double-increment, but then later when the IR was
cloned for linking, (2) would prevent the loop counter from being
cloned properly, so it would look to further analysis stages like an
independent variable (and hence the double-increment would stop
occurring). At the end of linking, (3) would prevent the loop counter
from being reparented, so it would still belong to the shader object
rather than the linked program object. Provided that the client
program didn't delete the shader object, the memory would never get
reclaimed, and so the shader would function properly.
However, for loops that didn't get unrolled, if the client program did
delete the shader object, and the memory belonging to the loop counter
got re-used, this could cause a use-after-free bug, leading to a
crash.
This patch fixes loop_control_visitor, ir_clone, and
ir_hierarchical_visitor to treat ir_loop::counter the same way the
back-ends treat it: as a freshly allocated ir_variable that needs to
be visited and cloned independently of other ir_variables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72026
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If an ir_loop has a non-null "counter" field, the variable referred to
by this field is implicitly read and written by the loop. We need to
account for this in ir_variable_refcount, otherwise there is a danger
we will try to dead-code-eliminate the loop counter variable.
Note: at the moment the dead code elimination bug doesn't occur due to
a bug in ir_hierarchical_visitor: it doesn't visit the "counter"
field, so dead code elimination doesn't treat it as a candidate for
elimination. But the patch to follow will fix that bug, so we need to
fix ir_variable_refcount first in order to avoid breaking dead code
elimination.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The flags value is a bitfield so use the union's 'bf' field, not 'e'
(enum) field. There's no actual change in behavior here since both
fields of the union are the same size.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Trying to compile any of these functions into a display list
now just generates a GL_INVALID_OPERATION error.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that it is possible to query drivers for the max sampler view it should
be safe to increase this without crashing.
Not entirely convinced this really works correctly though if state trackers
using non-linked sampler / sampler_views use this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Ever since introducing separate sampler and sampler view max this was really
missing.
Every driver but llvmpipe reports the same number as number of samplers for
now, so nothing should break.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support for this to more drivers, in particular for all the "special"
ones useful for debugging.
HW drivers are left alone, some should be able to support it if they want but
they may not be interested at this point.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Only allow __DRI_CTX_FLAG_ROBUST_BUFFER_ACCESS in brwCreateContext if
intelInitScreen2 also enabled __DRI2_ROBUSTNESS (thereby enabling
GLX_ARB_create_context).
This fixes a regression in the piglit test
"glx/GLX_ARB_create_context/invalid flag"
v2: Remove commented debug code. Noticed by Jordan.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reported-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch fixes this build error with Oracle Solaris Studio.
libtool: link: /opt/solarisstudio12.3/bin/cc -g -o glcpp/glcpp glcpp.o prog_hash_table.o ./.libs/libglcpp.a
Undefined first referenced
symbol in file
sqrt prog_hash_table.o
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In brw_update_renderbuffer_surfaces(), if there are no color draw
buffers, we always set up a null render target at surface index 0 so we
have something to use with the FB write marking the end of thread.
However, when we recently began computing surface indexes dynamically,
we failed to reserve space for it. This meant that the first texture
would be assigned surface index 0, and our closing FB write would
clobber the texture.
Fixes Piglit's EXT_packed_depth_stencil/fbo-blit-d24s8 test on Gen4-5,
which regressed as of commit 4e5306453d
("i965/fs: Dynamically set up the WM binding table offsets.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70605
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Tested-by: lu hua <huax.lu@intel.com>
Cc: "10.0" mesa-stable@lists.freedesktop.org
Now that we have dynamic binding tables there's no good reason anymore
to expose so few atomic counter buffers. Increase it to 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If reparent_ir() is called on invalid IR, then there's a danger that
it will fail to reparent all of the necessary nodes. For example, if
the IR contains an ir_dereference_variable which refers to an
ir_variable that's not in the tree, that ir_variable won't get
reparented, resulting in subtle use-after-free bugs once the
non-reparented nodes are freed. (This is exactly what happened in the
bug fixed by the previous commit).
This patch makes this kind of bug far easier to track down, by
transforming it from a use-after-free bug into an explicit IR
validation error.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 065da16 (glsl: Convert lower_clip_distance_visitor to be an
ir_rvalue_visitor), we failed to notice that since
lower_clip_distance_visitor overrides visit_leave(ir_assignment *),
ir_rvalue_visitor::visit_leave(ir_assignment *) wasn't getting called.
As a result, clip distance dereferences appearing directly on the
right hand side of an assignment (not in a subexpression) weren't
getting properly lowered. This caused an ir_dereference_variable node
to be left in the IR that referred to the old gl_ClipDistance
variable. However, since the lowering pass replaces gl_ClipDistance
with gl_ClipDistanceMESA, this turned into a dangling pointer when the
IR got reparented.
Prior to the introduction of geometry shaders, this bug was unlikely
to arise, because (a) reading from gl_ClipDistance[i] in the fragment
shader was rare, and (b) when it happened, it was likely that it would
either appear in a subexpression, or be hoisted into a subexpression
by tree grafting.
However, in a geometry shader, we're likely to see a statement like
this, which would trigger the bug:
gl_ClipDistance[i] = gl_in[j].gl_ClipDistance[i];
This patch causes
lower_clip_distance_visitor::visit_leave(ir_assignment *) to call the
base class visitor, so that the right hand side of the assignment is
properly lowered.
Fixes piglit test:
- spec/glsl-1.50/execution/geometry/clip-distance-itemized-copy
Cc: Ian Romanick <idr@freedesktop.org>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit fixes a bug wherein we would incorrectly refer to
stale geometry shader prog_data when no geometry shader was active.
This patch reduces the likelihood of that sort of bug occurring in the
future by setting prog_data to NULL whenever there is no GS program.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, in brw_gs_upload_binding_table(), we checked whether
brw->gs.prog_data was NULL in order to determine whether a geometry
shader was active. This didn't work: brw->gs.prog_data starts off as
NULL, but it is set to non-NULL when a geometry shader program is
built, and then never set to NULL again. As a result, if we called
brw_gs_upload_binding_table() while there was no geometry shader
active, but a geometry shader had previously been active, it would
refer to a stale (and possibly freed) prog_data structure.
This patch fixes the problem by modifying
brw_gs_upload_binding_table() to use the proper technique to determine
whether a geometry shader is active: by checking whether
brw->geometry_program is NULL.
This fixes the crash reported in comment 2 of bug 71870 (the incorrect
rendering remains, however).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71870
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than always advertising the extension but failing to create a
context with reset notifiction, just don't advertise it. I don't know
why it didn't occur to me to do it this way in the first place.
NOTE: Kristian requested that I provide a follow-up for master that
dynamically generates the list of DRI extensions instead of selected
between two hardcoded lists.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Replace all occurences of the macro with its expansion.
It seems that the macro intended to provide cross-platform static mutex
intialization. However, it had the same definition in all pre-processor
paths:
#define _EGL_DECLARE_MUTEX(m) _EGLMutex m = _EGL_MUTEX_INITIALIZER
Therefore this abstraction obscured rather than helped.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Insert two fields into _egl_global to hold the client extensions and
statically initialize them:
ClientExtensions // a struct of bools
ClientExtensionString
Post-patch, Mesa supports exactly one client extension,
EGL_EXT_client_extensions.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The fast tiled texture upload code does not compile with GCC 4.8's -Og
optimization flag.
memcpy() has the always_inline attribute set. This poses a problem,
since {x,y}tile_copy_faster calls it indirectly via {x,y}tile_copy,
and {x,y}tile_copy normally aren't inlined at -Og.
Using __attribute__((flatten)) tells GCC to inline every function call
inside the function, which I believe was the author's intent.
Fix suggested by Alexander Monakov.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: mesa-stable@lists.freedesktop.org
8 bit precision is required by d3d10 but unfortunately
requires 64 bit rasterizer. This commit implements
64 bit rasterization with full support for 8bit subpixel
precision. It's a combination of all individual commits
from the llvmpipe-rast-64 branch.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
.. and mark them off on the extensions list as done.
V2: Enable only if pipelined register writes work.
V3: Also update relnotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just prior to emitting the 3DPRIMITIVE command, we load each of the
indirect registers. The values loaded are either from offsets into the
current indirect BO, or constant zero if the parameter is not used for
this draw.
Enabling use of the indirect registers is done by turning on a bit in
the first dword of the 3DPRIMITIVE command itself.
V3: - Deduplicate the common part of both indexed and nonindexed indirect
setup.
- Just refer to the indirect bo out of the context directly.
V4: - Fix bo reference to specify the range we care about.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- MMIO registers for draw parameters
- New bit in 3DPRIMITIVE command to enable indirection
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
V3: - Disallow primcount==0 for DrawMulti*Indirect. The spec is unclear
on this, but it's silly. We might go back on this later if it
turns out to be a problem.
- Make it clear that the caller has dealt with stride==0
V4: - Allow primcount==0 again.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We will reuse the same extension flag for ARB_multi_draw_indirect since
it can always be supported by looping.
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on part of Patch 2 of Christoph Bumiller's ARB_draw_indirect series.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Split from patch implementing ARB_draw_indirect.
v2: Const-qualify the struct gl_buffer_object *indirect argument.
v3: Fix up some more draw calls for new argument.
v4: Fix up rebase conflicts in i965.
v5: Undo const-qualification
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
V2: drop description of `fall` and `wm`, which have been removed by the
previous patch; describe `stats`.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
DEBUG_IOCTL comes from i965, and is about to be removed. Both defines
have the same value (4).
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Limit the max glsl version level to what the state tracker supports.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
V2: Return after error to avoid cascading error messages and
removed redundant "to" from error message
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I recently made us try two different things that tried to reduce register
pressure so that we would be more likely to allocate successfully. But
now that we have the logic for trying two, we can make the first thing we
try be the normal, not-prioritizing-register-pressure heuristic.
This means one less scheduling pass in the common case of that heuristic
not producing spills, plus the best schedule we know how to produce, if
that one happens to succeed. This is important, because our register
allocation produces a lot of possibly avoidable dependencies for the
post-register-allocation schedule, despite ra_set_allocate_round_robin().
GLB2.7: 1.04127% +/- 0.732461% fps improvement (n=31)
nexuiz: No difference (n=5)
lightsmark: 0.838512% +/- 0.300147% fps improvement (n=86)
minecraft apitrace: No difference (n=15)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The canary is basically just to give a better debugging message when you
ralloc_free() something that wasn't rallocated. Reduces maximum memory
usage of apitrace replay of the dota2 demo by 60MB on my 64-bit system (so
half that on a real 32-bit dota2 environment).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think I was thinking of the batch command packet cache when I pasted
this in, but this counter is only used for dumping out streamed state for
INTEL_DEBUG=batch and for putting annotations in our aub files.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 2f89662 added the driconf option 'clamp_max_samples'. In that
commit, the option did not alter the context version. The neglect to
alter the context version is a fatal issue for some apps.
For example, consider running Chromium with clamp_max_samples=0.
Pre-patch, Mesa creates a GL 3.0 context but clamps GL_MAX_SAMPLES to
0. This violates the GL 3.0 spec, which requires GL_MAX_SAMPLES >= 4.
The spec violation causes WebGL context creation to fail in many
scenarios because Chromium correctly assumes that a GL 3.0 context
supports at least 4 samples.
Since the driconf option was introduced largely for Chromium, the issue
really needs fixing.
This patch fixes calculation of the context version to respect the
post-clamped value of GL_MAX_SAMPLES. This in turn fixes WebGL on
Chromium when clamp_max_samples=0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
clamp_max_samples() and intel_quantize_num_samples() each maintained
their own list of which MSAA modes the hardware supports. This patch
removes the duplication by making intel_quantize_num_samples() use the
same list as clamp_max_samples(), the list maintained in
brw_supported_msaa_modes().
By removing the duplication, we prevent the scenario where someone
updates one list but forgets to update the other.
Move function `brw_context.c:static brw_supported_msaa_modes()` to
`intel_screen.c:(non-static) intel_supported_msaa_modes()` and patch
intel_quantize_num_samples() to use the list returned by that function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This simplifies the loop logic in a subsqequent patch that refactors
intel_quantize_num_samples() to use brw_supported_msaa_modes().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
These degenerate instructions can often be emitted by state trackers
when the semantics of instructions don't match precisely.
Reviewed-by: Brian Paul <brianp@vmware.com>
Several of links the were contributed by Keith Whitwell and Roland Scheidegger.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
From section 6.1.18 (Renderbuffer Object Queries) of the GL 3.2 spec,
under the heading "If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE
is TEXTURE, then":
If pname is FRAMEBUFFER_ATTACHMENT_LAYERED, then params will
contain TRUE if an entire level of a three-dimesional texture,
cube map texture, or one-or two-dimensional array texture is
attached. Otherwise, params will contain FALSE.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/framebuffer-layered-attachments
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture-defaults
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Don't include "EXT" in the error message, since this query only
makes sensen in context versions that have adopted
glGetFramebufferAttachmentParameteriv().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using the code path for validating
glFramebufferTextureLayer(). But glFramebufferTexture() allows
additional texture types.
Fixes piglit tests:
- spec/!OpenGL 3.2/layered-rendering/gl-layer-cube-map
- spec/!OpenGL 3.2/layered-rendering/framebuffertexture
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Clarify comment above framebuffer_texture().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the fast depth clear path.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-depth".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes the blorp clear path for color buffers.
Fixes piglit test "spec/!OpenGL 3.2/layered-rendering/clear-color".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From section 4.4.7 (Layered Framebuffers) of the GLSL 3.2 spec:
When the Clear or ClearBuffer* commands are used to clear a
layered framebuffer attachment, all layers of the attachment are
cleared.
This patch fixes meta clears to properly clear all layers of a layered
framebuffer attachment. We accomplish this by adding a geometry
shader to the meta clear program which sets gl_Layer to a uniform
value. When clearing a layered framebuffer, we execute in a loop,
setting the uniform to point to each layer in turn.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to properly clear layered framebuffers, we need to know how
many layers they have. The easiest way to do this is to record it in
the gl_framebuffer struct when we check framebuffer completeness.
This patch replaces the gl_framebuffer::Layered boolean with a
gl_framebuffer::NumLayers integer, which is 0 if the framebuffer is
not layered, and equal to the number of layers otherwise.
v2: Remove gl_framebuffer::Layered and make gl_framebuffer::NumLayers
always have a defined value. Fix factor of 6 error in the number of
layers in a cube map array.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prevents a GPU page fault if somehow the uniform bo gets evicted
before the screen_create pushbuf has been submitted.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Previously, we checked for interstage uniform interface block link
errors in validate_interstage_interface_blocks(), which is only called
on pairs of adjacent shader stages. Therefore, we failed to detect
uniform interface block mismatches between non-adjacent shader stages.
Before the introduction of geometry shaders, this wasn't a problem,
because the only supported shader stages were vertex and fragment
shaders, therefore they were always adjacent. However, now that we
allow a program to contain vertex, geometry, and fragment shaders,
that is no longer the case.
Fixes piglit test "skip-stage-uniform-block-array-size-mismatch".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Rename validate_interstage_interface_blocks() to
validate_interstage_inout_blocks() to reflect the fact that it no
longer validates uniform blocks.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
v3: Make validate_interstage_inout_blocks() skip uniform blocks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, when attempting to link a vertex shader and a geometry
shader that use different GLSL versions, we would sometimes generate a
link error due to the implicit declaration of gl_PerVertex being
different between the two GLSL versions.
This patch fixes that problem by only requiring interface block
definitions to match when they are explicitly declared.
Fixes piglit test "shaders/version-mixing vs-gs".
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: In the interface_block_definition constructor, move the assignment
to explicitly_declared after the existing if block.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 7.1 (Built-In Language Variables) of the GLSL 4.10
spec:
Also, if a built-in interface block is redeclared, no member of
the built-in declaration can be redeclared outside the block
redeclaration.
We have been regarding this text as a clarification to the behaviour
established for gl_PerVertex by GLSL 1.50, so we apply it regardless
of GLSL version.
This patch enforces the rule by adding an enum to ir_variable to track
how the variable was declared: implicitly, normally, or in an
interface block.
Fixes piglit tests:
- gs-redeclares-pervertex-out-after-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-global-redeclaration.vert
- gs-redeclares-pervertex-out-after-other-global-redeclaration.geom
- vs-redeclares-pervertex-out-after-other-global-redeclaration.vert
- gs-redeclares-pervertex-out-before-global-redeclaration
- vs-redeclares-pervertex-out-before-global-redeclaration
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
v2: Don't set "how_declared" redundantly in builtin_variables.cpp.
Properly clone "how_declared".
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Unfortunately, our hardware only has one set of aggregating performance
counters shared between all 3D programs, and their values are not saved
or restored by hardware contexts. Also, at least on Sandybridge and
Ivybridge, the counters lose their values if the GPU goes to sleep.
To work around both of these problems, we have to snapshot the
performance counters at the beginning and end of each batch, similar to
how we handle query objects on platforms that don't support hardware
contexts. I call these "bookend" snapshots.
Since there can be multiple performance monitors active at a time, we
store the bookend snapshots in a global BO, shared by all monitors.
For monitors that span multiple batches, acquiring results involves
adding up three segments:
BeginPerfMonitor --> End of Batch 1 ("head")
Start of Batch 2 --> End of Batch 2
... ("middle")
Start of Batch N-1 --> End of Batch N-1
Start of Batch N --> EndPerfMonitor ("tail")
Monitors that refer to bookend BO snapshots are considered "unresolved".
We delay resolving them (and adding up deltas to obtain the results) as
long as possible to avoid blocking on mapping monitor->oa_bo.
We can also run out of space in the bookend BO, at which point we have
to resolve all unresolved monitors. Then we can throw away the
snapshots and begin writing at the beginning of the buffer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to use the Observability Architecture effectively, we'll need
to take snapshots of the OA counters via MI_REPORT_PERF_COUNT at the
start and end of each batch.
Experimentation reveals that we need to flush before and after each
MI_REPORT_PERF_COUNT to get working values. For simplicitly, I chose to
use intel_batchbuffer_emit_mi_flush(), which unfortunately expands to
triple pipe controls on Sandybridge.
We may want to start computing per-generation reserved batch space to
avoid the insanity of Sandybridge's PIPE_CONTROL cost. That said, much
of this cost existed before I rewrote the query object support to use
hardware contexts, so it's at least not entirely new.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, this only considers the monitor start and end snapshots.
This is woefully insufficient, but allows me to add a bunch of the
infrastructure now and flesh it out later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to start OA at the beginning of each batch where monitors are
active. OACONTROL isn't part of the hardware context, so to avoid
leaving counters enabled for other applications, we turn them off at the
end of the batch too.
We also need to start them at BeginPerfMonitor time (unless they've
already been started). We stop them when the monitor last ends as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll need to write this register to start/stop performance counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
MI_REPORT_PERF_COUNT writes a snapshot of the Observability Architecture
counters to a buffer. Exactly how it works varies between generations:
Ironlake requires two packets, Sandybridge has to use GGTT, and Ivybridge
and later use PPGTT.
v2: Assert that we didn't use more space than we reserved (suggested
by Eric Anholt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using the OA counters requires some per-batch work. When starting and
ending a batch, it's useful to know whether any monitors are actually
interested in OA data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In addition to listing the counter names, we include several "remap"
tables. Confusingly, counters are documented with names like "A23",
are written to some buffer offset other than 23, and exposed by core
Mesa under a counter ID that is different still.
The first is inevitable; MI_REPORT_PERF_COUNT writes certain counters to
fixed locations in the buffer. The latter could be avoided, but core
Mesa uses the "Counters" array index as the ID for a counter. We could
do remapping there, but it would just complicate the core Mesa code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is fairly simple:
- At BeginPerfMonitor time, take an opening snapshot.
- At EndPerfMonitor time, take a closing snapshot.
- The first time the application asks for results, subtract the two and
store that value. Then free the BO containing the snapshots.
- On subsequent requests for the results, just return the saved value.
- On reset, throw away the results.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, we only support these on Gen6+, since that's what currently
uses hardware contexts. When we add Ironlake hardware context support,
we can add pipeline statistics register support for that as well.
In theory, we could support pipeline statistics counters even without
hardware contexts, but it would be annoyingly painful.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we don't support any counters, there are zero groups.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Observability Architecture counters are 32-bit unsigned values, and
the Pipeline Statistics Register counters are 64-bit unsigned values.
These convenience macros make it easy to create those types of counters.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's useful to see the state of all outstanding monitors; the start
of a new batch seems like a reasonable time to print them out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These stub functions will be filled out in later patches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will enable debugging printfs for the AMD_performance_monitor code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without hardware contexts, the pipeline statistics registers are
free-running and include data from every 3D application running.
In order to find out the contributions of one particular context, we
need to take a snapshot at the start and end of each batch.
Previously, we emitted the PIPE_CONTROL necessary to capture
PS_DEPTH_COUNT when drawing primitives. Special tracking ensured it
happened only on the first draw of the batch, rather than on every draw.
Moving this to brw_new_batch increases symmetry, since the final
snapshot has always been in brw_finish_batch, which is just a few lines
below. It should be basically equivalent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The new intel_batchbuffer_emit_render_ring_prelude() hook will be called
when switching from BLT or UNKNOWN_RING to RENDER_RING. This provides a
place to emit state that should go at the start of each render ring
batch, with minimal overhead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When we first create a batch buffer, it's empty. We don't actually
know what ring it will be targeted at until the first BEGIN_BATCH or
BEGIN_BATCH_BLT macro.
Previously, one could determine the state of the batch by checking
brw->batch.ring (blit vs. render) and brw->batch.used != 0 (known vs.
unknown).
This should be functionally equivalent, but the tri-state enum is a bit
clearer.
v2: Catch three explicit require_space callers (thanks to Carl and Eric).
v3: Split the boolean -> enum change from the UNKNOWN_RING change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Passing BLT_RING or RENDER_RING to batchbuffer functions is a lot more
obvious than passing true or false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some rounding errors could crop up when calculating a0. Use a more accurate
method (barycentric interpolation essentially) to fix this, though to fix
the REAL problem (which is that our interpolation will give very bad results
with small triangles far away from the origin when they have steep gradients)
this does absolutely nothing (actually makes it worse). (To fix the real
problem, either would need to use a vertex corner (or some other point inside
the tri) as starting point value instead of fb origin and pass that down to
interpolation, or mimic what hw does, use barycentric interpolation (using
the coordinates extracted from the rasterizer edge functions) - maybe another
time.)
Some (silly) tests though really want a high accuracy at fb origin and don't
care much about anything else (Just. Don't. Ask.).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
D3D9 Shader Model 2 restricted the fog register to one component,
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172945.aspx ,
but that restriction no longer exists in Shader Model 3, and several
WHCK tests enforce that.
So this change:
- lifts the single-component restriction TGSI_SEMANTIC_FOG
from Gallium interface
- updates the Mesa state tracker to enforce output fog has (f, 0, 0, 1)
- draw module was updated to leave TGSI_SEMANTIC_FOG output registers
alone
Several gallium drivers that are going out of their way to clear
TGSI_SEMANTIC_FOG components could be simplified in the future.
Thanks to Si Chen and Michal Krol for identifying the problem.
Testing done: piglit fogcoord-*.vpfp tests
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Same as Si Chen's commit e7a5905d8a for
tgsi_exec module.
Not actually tested, because softpipe is failing the test that caught
this bug due to unrelated issues.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Earlier comments suggest this was removed from GL core spec but it is
still there. Enabling makes 'texture_lod_bias_getter' Khronos
conformance tests pass, also removes some errors from Metro Last Light
game which is using this API.
v2: leave NOTE comment (Ian)
Cc: "9.0 9.1 9.2 10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
The alignment of a virtual memory area must always be at least 4096 bytes.
It only worked because size was aligned to 4096 outside of the function.
Signed-off-by: Christian König <christian.koenig@amd.com>
If a user set MESA_INFO and the OpenGL application uses a
3.0 or later context then the MESA_INFO debug output will have
an error when it queries for extensions using the deprecated
enum GL_EXTENSIONS. Passing context argument allows code
to return extension list directly regardless of profile.
Commit title updated as recommended by Kenneth Graunke.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
BLORP is essential. However, porting it to Gen8 is a huge amount of
work. Disabling it for now allows us to proceed with basic hardware
enablement.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
HiZ is difficult to implement, and while it's essential for performance,
we don't need it right away for purposes of hardware enabling.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As always, the chipset limits here are placeholders, rather than the
actual values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 passed piglit, but swrast and gallium both segfaulted without this.
i965 happened to work because it never ran _mesa_load_state_parameters()
on the new program before the test called glProgramLocalParameter(), which
was allocating a LocalParams array for the fallback path.
v2: Since v1 threw away old localparams data, leaked old LocalParams
memory, only fixed fragment programs, and I was dubious of my previous
invariants already (nothing but program_parse.y will generate
LocalParams, and only that one path of program_parse.y will), just
late-allocate localparams at the other point of dereferencing them.
This adds overhead to _mesa_load_state_parameter, which is
uncomfortable, but I'm pretty sure that giant switch statement is
super slow already.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71734
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions.
total instructions in shared programs: 1360393 -> 1360387 (-0.00%)
instructions in affected programs: 157 -> 151 (-3.82%)
(no change in vertex shaders)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For commit 4df56177 Paul discovered that the hardware restriction that
Align16 instructions cannot be compressed was lifted on Haswell. This
has prevented us from emitting compressed three-source instructions.
For added confirmation, the bspec lists a work around called
WaBreakSimd16TernaryInstructionsIntoSimd8 that hasn't been applicable
since very early Haswell silicon.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, register_coalesce() would modify
mov vgrf1:f vgrf2:f
cmp null vgrf3:d vgrf1:d
to be
cmp null vgrf3:d vgrf2:f
and incorrectly use vgrf2's type in the instruction that the mov was
coalesced into.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not necessary to scale down cubemap texture coords when generating
mipmaps: we are doing a 2x minification therefore it's guaranteed that
the texture coords will always be at least 1 texel away of the edges.
Scaling down can actually be harmful, as it may cause artefacts when
generating mipmaps with nearest filtering. Sample points will lie
exactly in the middle each 2x2 texels, so the scaling factor was causing
different texels to be take on each quadrant of the cube face. This is
apparent with a 1x1 checkerboard pattern in the base mipmap level:
instead of next mipmap level receiving a constant color throughout the
face, it will have different colors for each quadrant of the face.
The behaviour for blits is left untouched for now, but the cubemap
texture coord scaling hack should be reconsidered eventually.
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to check the drawbuffer's orientation before inverting Y
coordinates. Fixes piglit feedback tests when running with the
-fbo option.
Cc: "9.2" "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The exec_mask must be taken in consideration, just like emit_kill above.
The tgsi_exec module has the same bug and should be fixed in a future
change.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Gen7 does not allow render targets to have a vertical alignment of 2.
So, when creating a surface, if its format is renderable, and its
vertical alignment is 2, force it to use X tiling.
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen6+ allows for color buffers to use a vertical alignment of either 4
or 2. Previously we defaulted to 2. This may have caused problems on
Gen7 because Y-tiled render targets are not allowed to use a vertical
alignment of 2.
This patch changes the vertical alignment to 4 on Gen7, except for the
few formats where a vertical alignment of 2 is required.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 70953b5 (i965: Initialize all member variables of
vec4_instruction on construction) inadvertently added a line to the
vec4_instruction constructor setting this->ir to NULL, wiping out the
previously set value. As a result, ever since then, the output of
INTEL_DEBUG=vs and INTEL_DEBUG=gs has been missing IR annotations.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is basically a a respin of f1dfcf4bce35e6796f873d9a00103b280da81e4c
per Jose's suggestion.
Just set the SVGA3dSurfaceFormatCaps flags for 3D and cube textures
when checking the texture format capabilities. This will filter out
unsupported combinations like 3D+DXT.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Refine 8c533022. Provide a stub __glXGetCurrentContext() function when
$(DEFINES) are such that it is not a macro.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
We were always passing PIPE_TEXTURE_2D, but not all formats are
supported for all types of textures. In particular, the driver may
not supported texture compression for all types of textures.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
v2: Don't go to extra work to avoid extraneous flushes. (Previous
experiments in the kernel have suggested that flushing the pipeline
when it is already empty is extremely cheap).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add new OSMesaPostprocess() function to allow using the gallium
postprocessing filters. This only works for OSMesa with gallium
drivers, not the legacy swrast OSMesa.
Bump OSMESA_MAJOR/MINOR_VERSION numbers to 10.0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move private data structures and function prototypes out of the
public postprocess.h header file.
Create a pp_private.h for the shared, private data structures, functions.
Remove pp_program.h header.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
- Indent items under a GL version to allow context diffs to do their work.
- Move complete drivers into the GL version line - this should make the
stuff a little bit easier to read.
v2: keep the fd.o link (Emil Velikov)
Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Joerg Mayer <jmayer@loplof.de>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The above two variables are unused as of commit
commit 024fe6852a
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Apr 2 10:42:50 2013 -0700
radeon/llvm: Use LLVM C API for compiling LLVM IR to ISA v2
which removed the only cpp file from drivers/radeon, but missed to
remove the CXXFLAGS. The sequential commit reintroduced and empty
LLVM_CPP_FILES.
Lets cleanup and remove both.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
If a performance monitor has never ended, then no result can be
available. Core Mesa can easily handle this, saving drivers a tiny bit
of complexity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If a monitor has ended, it means a result should eventually become
available, pending some flushing.
This is distinct from !m->Active; if a monitor has not been started,
then m->Active == false and m->Ended == false.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The i965 implementation uses calloc, so I missed this. It's best to
simply initialize it to avoid requiring a zeroing allocator, though.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that branch 10.0 is created, bump the minor version in
master.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We'll need this for Broadwell code as well.
Normally, when we make things public, we add the "brw" prefix. I'm not
crazy about that in this case, since it deals with prog_instruction.h's
SWIZZLE_XYZW values, rather than the BRW_SWIZZLE_XYZW enums. However,
I can't think of a better name, and at least the comments and code make
it clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Broadwell code should not include brw_eu.h (since it is for Gen4-7
assembly encoding), but needs the URB write flags enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Only Gen4 color write setup uses the force_sechalf flag, and it only
sets it on a single instruction. It also already has to get a pointer
to the instruction and manually set the saturate flag, so we may as well
just set force_sechalf the same way and avoid the complexity of a stack.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
Previous assumption was that the same set of flags can be reused
for both classic and gallium drivers. With megadriver work done
the classic drivers ended up using their own (single) instance of
the flags.
Move these into Automake.inc and rename to indicate that those
are gallium specific. Additionally silence an automake/autoconf
warning "XXX is not a standard libtool library name", due to
the parsing issues of the module tag.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Greatly reduce duplication and provide a sane minimum of
CFLAGS for all DRI targets.
Note: This commit adds VISIBILITY_CFLAGS to the following:
* freedreno
* i915
* ilo
* nouveau
* vmwgfx
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order to use the trace driver, one needs to define
GALLIUM_TRACE. Neither one of the two targets was
defining it, thus we're safe to remove libtrace.la.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Automake.inc already has GALLIUM_VIDEO_CFLAGS, which
provide the essential compiler flags needed.
Note: this commit adds VISIBILITY_CFLAGS to nouveau.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Cleanup the duplicating flags and consolidate into a sigle variable.
Note: this patch adds VISIBILITY_CFLAGS to the following targets
* freedreno/drm
* i915/{drm,sw}
* nouveau/drm
* sw/fbdev
* sw/null
* sw/wayland
* sw/wrapper
* sw/xlib
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In order for one to use trace, noop, rbug and/or galahad, they must
set the corresponding GALLIUM_* CFLAG.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Store the compiler flags into a variable, in order to minimise
flags duplication (amongst vdpau and xvmc).
Note: this commit add VISIBILITY_CFLAGS to the nouveau target
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* minimise flags duplication
* distingush between VISIBILITY C and CXX flags
* set only required flags - C and/or CXX
v2: add LLVM_CFLAGS back to AM_CFLAGS (add missing backslash)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
* Allow the lists to be shared among build systems.
* Update automake and Android build systems.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Nearly everything within the three Makefile.am's is identical.
Let's simplify things a little.
v2: Rebase and rewrite the commit message (Emil Velikov)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The array was 64kb per struct gl_program, plus we statically stored a copy
of one on disk for _mesa_DummyProgram. Given that most struct gl_programs
we generate are for GLSL shaders that don't have local parameters, this
was a waste.
Since you can store and fetch parameters beyond what the program actually
uses, we do have to do a late allocation if necessary at
GetProgramLocalParameter time.
Reduces peak memory usage in the dota2 trace I made by 76MB (4.5%)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to env parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This has been replaced with referring to local parameters using
PROGRAM_STATE_VAR and _mesa_load_state_parameters.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Notably, ENV and LOCAL aren't used any more (replaced by STATE_VAR), but
apparently CONSTANT is.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment was stale, because the lowering in question wasn't happening
in lower_instructions.cpp. Presumably if the lowering ever moves there,
we can plumb the lowering mask through to opt_algebraic.
total instructions in shared programs: 1618696 -> 1616810 (-0.12%)
instructions in affected programs: 243018 -> 241132 (-0.78%)
GAINED: 0
LOST: 0
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Previously, brw_new_batch was called just after execbuf, but before
intel_batchbuffer_reset. Essentially, it prepared for the creation of a
new batch, that wasn't yet available, and which it didn't create. This
was a bit awkward.
This patch makes brw_new_batch call intel_batchbuffer_reset as the very
first operation. This means that brw_new_batch actually creates a new
batchbuffer, and thus has it available. It brings the creation of the
new batchbuffer and BRW_NEW_BATCH flagging together into one place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
More rebase fail. This code was written long before i915 and i965 were
split, so most of the code in i9[16]5/intel_screen.c only needed to
exist in one place. It looks like I fixed n-1 of those places after
rebasing on the split.
I only found this from the defined-but-not-used warning for
intelRendererQueryExtension. I noticed this while fixing the other,
related warnings.
(Note: During review, we decided to *not* pick this back to 10.0.)
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
radeon_llvm_compile allocates memory for binary.code, binary.config,
or neither depending on what's being done.
We need to make sure to free that memory after it's no longer needed.
v2: Don't bother checking for null before FREE()
CC: "10.0" <mesa-stable@lists.freedesktop.org>
use memset to initialize to 0's... otherwise code_size and config_size
could be uninitialized when read later in this method.
It's also hard to do NULL checks on uninitialized pointers.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
v2: Fix indentation
CC: "10.0" <mesa-stable@lists.freedesktop.org>
For DX9-level shaders, there's only limited support for indirect
indexing of registers (with the loop counter register, not the
general address register.)
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
After we blit/copy to a dest texture image we need to mark it as
being defined. This fixes broken mipmap generation for quite a
few texture formats. Mipgen involves making texture views and
svga_texture_view_surface() skips texture images that are undefined.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The index translation code expects the number of indexes to be
consistent with the primitive type (ex: a multiple of 3 for
PIPE_PRIM_TRIANGLES). If it's not, we can write out of bounds
in the destination buffer.
Fixes failed assertions in the pipebuffer debug code found with
Piglit primitive-restart-draw-mode test.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, when doing intrastage and interstage interface block
linking, we only checked the interface type; this prevented us from
catching some link errors.
We now check the following additional constraints:
- For intrastage linking, the presence/absence of interface names must
match.
- For shader ins/outs, the interface names themselves must match when
doing intrastage linking (note: it's not clear from the spec whether
this is necessary, but Mesa's implementation currently relies on
it).
- Array vs. nonarray must be consistent, taking into account the
special rules for vertex-geometry linkage.
- Array sizes must be consistent (exception: during intrastage
linking, an unsized array matches a sized array).
Note: validate_interstage_interface_blocks currently handles both
uniforms and in/out variables. As a result, if all three shader types
are present (VS, GS, and FS), and a uniform interface block is
mentioned in the VS and FS but not the GS, it won't be validated. I
plan to address this in later patches.
Fixes the following piglit tests in spec/glsl-1.50/linker:
- interface-blocks-vs-fs-array-size-mismatch
- interface-vs-array-to-fs-unnamed
- interface-vs-unnamed-to-fs-array
- intrastage-interface-unnamed-array
v2: Simplify logic in intrastage_match() for handling array sizes.
Make extra_array_level const. Use an unnamed temporary
interface_block_definition in validate_interstage_interface_blocks()'s
first call to definitions->store().
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
From the Sandy Bridge PRM, Vol 1 Part 1 7.18.3.4 (Alignment Unit
Size):
j [vertical alignment] = 4 for any render target surface is
multisampled (4x)
From the Ivy Bridge PRM, Vol 4 Part 1 2.12.2.1 (SURFACE_STATE for most
messages), under the "Surface Vertical Alignment" heading:
This field is intended to be set to VALIGN_4 if the surface was
rendered as a depth buffer, for a multisampled (4x) render target,
or for a multisampled (8x) render target, since these surfaces
support only alignment of 4.
Back in 2012 when we added multisampling support to the i965 driver,
we forgot to update the logic for computing the vertical alignment, so
we were often using a vertical alignment of 2 for multisampled
buffers, leading to subtle rendering errors.
Note that the specs also require a vertical alignment of 4 for all
Y-tiled render target surfaces; I plan to address that in a separate
patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53077
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For both vertex and fragment shaders we default MaxUniformComponents
to 4 * MAX_UNIFORMS. It makes sense to do this for geometry shaders
too; if back-ends have different limits they can override them as
necessary.
Fixes piglit test:
spec/glsl-1.50/built-in constants/gl_MaxGeometryUniformComponents
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
* Inherit gl_context so we always have access to it
* Thanks curro for the idea.
* Last Haiku cannidate for 10.0.0
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The debug printfs wouldn't actually compile when enabled, so kill them off
and insert some new one in another place, and make sure it keeps compiling
by enclosing it in a if-0 clause.
In particular get rid of home-grown vector helpers which didn't add much.
And while here fix formatting a bit. No functional change.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
d3d10 requires us to convert NaNs to zero for any float->int conversion.
We don't really do that but mostly seems to work. In particular I suspect the
very common float->unorm8 path only really passes because it relies on sse2
pack intrinsics which just happen to work by luck for NaNs (float->int
conversion in hw gives integer indeterminate value, which just happens to be
-0x80000000 hence gets converted to zero in the end after pack intrinsics).
However, float->srgb didn't get so lucky, because we need to clamp before
blending and clamping resulted in NaN behavior being undefined (and actually
got converted to 1.0 by clamping with sse2). Fix this by using a zero/one clamp
with defined nan behavior as we can handle the NaN for free this way.
I suspect there's more bugs lurking in this area (e.g. converting floats to
snorm) as we don't really use defined NaN behavior everywhere but this seems
to be good enough.
While here respecify nan behavior modes a bit, in particular the return_second
mode didn't really do what we wanted. From the caller's perspective, we really
wanted to say we need the non-nan result, but we already know the second arg
isn't a NaN. So we use this now instead, which means that cpu architectures
which actually implement min/max by always returning non-nan (that is adhering
to ieee754-2008 rules) don't need to bend over backwards for nothing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Systems with little physical memory installed will report less than
2GiB, and some systems may (hypothetically?) have a larger address space
for the GPU. My IVB still reports 1534.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This patch implements accelerated path for glDrawPixels from a PBO in
i965. The code follows what intel_pixel_read, intel_pixel_copy,
intel_pixel_bitmap and intel_tex_image are doing. Piglit quick.tests
show no regressions. In my testing on IVB, performance improvement is
huge (about 30x, didn't measure exactly) since generic path goes via
_mesa_unpack_color_span_float, memcpy, extract_float_rgba.
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
createContextAttribs is a superset of what createNewContext provides.
Also remove the function typedef, since createNewContext is deprecated
and no longer used in multiple interfaces.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This lets us allocate color buffers as __DRIimages and pass them into
the driver instead of having to create a __DRIbuffer with the flink
that requires.
With this patch, we can now run gbm on render-nodes. A render-node is a
drm device that doesn't support modesetting and all the legacy DRI ioctls.
flink is also not supported, but now that gbm doesn't need flink, we can
run piglit on head-less gbm or head-less GPGPU.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since LIFO fails on some shaders in one particular way, and non-LIFO
systematically fails in another way on different kinds of shaders, try
them both, and pick whichever one successfully register allocates first.
Slightly prefer non-LIFO in case we produce extra dependencies in register
allocation, since it should start out with fewer stalls than LIFO.
This is madness, but I haven't come up with another way to get unigine
tropics to not spill while keeping other programs from not spilling and
retaining the non-unigine performance wins from texture-grf.
total instructions in shared programs: 1626728 -> 1626288 (-0.03%)
instructions in affected programs: 1015 -> 575 (-43.35%)
GAINED: 50
LOST: 0
Improves Unigine Tropics performance by 14.5257% +/- 0.241838% (n=38)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Long ago, the HW_REG usage in assign_curb/urb_setup() were scheduling
barriers, so we had to run scheduler before them in order for it to be
able to do basically anything. Now that that's fixed, we can delay the
scheduling until we go to allocate (which will make the next change less
scary).
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We care about depth-until-program-end, as a proxy for "make sure I
schedule those early instructions that open up the other things that can
make progress while keeping register pressure low", not actual latency
(since we're relying on the post-register-alloc scheduling to actually
schedule for the hardware).
total instructions in shared programs: 1609931 -> 1609931 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 55
LOST: 43
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In the SIMD16 spilling changes, I replaced a "1" in the spill path with
"mlen", but obviously it wasn't mlen before because spills have the g0
header along with the payload. The interface I was trying to use was
asking for how many physical regs we're writing, so we're looking for "1"
or "2".
I'm guessing this actually passed piglit because the high 8 bits of the
execution mask in SIMD8 mode are all 0s.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, the best thing we had was to schedule the things unblocked by
the last chosen instruction, on the hope that it would be consuming two
values at the end of their live intervals while only producing one new
value. But that's just a guess, and we can do counting of usage of
registers to know when an instruction would (almost surely) reduce
register pressure.
The only failure mode I know of in this new dominant heuristic is that
inside of a loop when scheduling the iterator (for example), choosing the
last use of the iterator doesn't actually reduce the live interval of the
iterator. But it doesn't seem to matter in shader-db:
total instructions in shared programs: 1618700 -> 1618700 (0.00%)
instructions in affected programs: 0 -> 0
GAINED: 13
LOST: 0
Note: The new functions are made virtual because I expect we'll soon lift
the pre-regalloc scheduling heuristic over to the vec4 backend.
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes infinite loop in find_grid_optimal_factor() in cases where the
user specifies a grid size with less dimensions than the device
supports.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Since we explicitly require a integer input we should avoid using exp2 math
(even if we were using optimized versions), which turns the exp2 into a int
sub (plus some casts).
v2: fix bogus uint (needs to be int) math spotted by Matthew, fix comments
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Otherwise, the function would enable generic vertex attributes 0
and 1 of the array object it does not own. This was causing crashes
in Euro Truck Simulator 2, since the incorrectly enabled generic
attribute 0 in the foreign context got precedence before vertex
position attribute at later time, leading to NULL pointer dereference.
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Petr Sebor <petr@scssoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Adding a vl_mpeg-based helper didn't seem to work, as it produced data
that the card couldn't handle. (And I didn't investigate further.) This
makes the decoding functionality only accessible via XvMC and avoids
crashes when attempting to use VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.0" <mesa-stable@lists.freedesktop.org>
This fixes a crash in glamor when mesa links against static LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
This makes it possible to use clover with statically linked LLVM.
v2:
- Inline LINKER_SCRIPT variable
v3: Kai Wasserbäch
- Fix out out-of-tree-builds
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.or>
X_f, Y_f, Xp_f, Yp_f variables are used just inside
translate_dst_to_src().So, they can be defined just
as local variables.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When this function was added, the returned value was signed in some
places, unsigned in others.
v2: also add unsigned in the unit test, per Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we would bogusly replace the entire statement containing the
ir_texture node with an ir_dereference_variable.
Correct this to just replace the ir_texture node itself as intended.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b16b3c87 began performing CSE on CMP instructions with null
destinations. I relaxed the restrictions a bit too much, thereby
allowing CSE to be performed on instructions with, for instance, an
explicit accumulator destination.
This broke the arb_gpu_shader5/fs-imulExtended shader tests because
they emit MUL instructions with the accumulator as the destination. CSE
would instead cause the MUL to write to a GRF, which is lower precision
than the accumulator.
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: 10.0 <mesa-stable@lists.freedesktop.org>
<li><ahref="http://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
<li><ahref="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<ahref="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
<li><ahref="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64323">Bug 64323</a> - Severe misrendering in Left 4 Dead 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68838">Bug 68838</a> - GLSL: struct declarations produce a "empty declaration warning" in 9.2</li>
@@ -55,16 +57,89 @@ Note: some of the new features are only available with certain drivers.
<li>GL_ARB_vertex_attrib_binding</li>
<li>GL_ARB_vertex_type_10f_11f_11f_rev on i965 and r600g</li>
<li>GL_KHR_debug</li>
<li>GLX_MESA_query_renderer</li>
</ul>
<h2>Bug fixes</h2>
TBD.
<p>Attempts have been made to <b>not</b> include bugs fixed in previous 9.2
releases or bugs that were regressions during 10.0 development. This list is
likely incomplete.</p>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=47755">Bug 47755</a> - [glsl-compiler] no error checking when Interpolation qualifier for built-in variable is different in vertex and fragment shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=52171">Bug 52171</a> - [gallium/r600/clover] Simple benchmarks failed to run</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=53077">Bug 53077</a> - [IVB] Output error with msaa when both of framebuffer and source color's alpha are not 1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=54867">Bug 54867</a> - bug in r300 compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=60929">Bug 60929</a> - [r600-llvm] mono games with opengl are blocking on start</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62142">Bug 62142</a> - Mesa/demo mipmap_limits upside down with running by SOFTWARE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66213">Bug 66213</a> - Certain Mesa Demos Rendering Inverted (vertically)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66806">Bug 66806</a> - [softpipe] glxgears floating point exception</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=67921">Bug 67921</a> - [bisected commit 883987] crosscompiling fails with util/u_cpu_detect.c:247:4: error: 'asm' undeclared (first use in this function)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68162">Bug 68162</a> - [radeonsi] texture rendering is broken in Source-Engine games</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68451">Bug 68451</a> - Texture flicker in native Dota2 in mesa 9.2.0rc1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68503">Bug 68503</a> - Graphical glitches in Serious Sam 3 when SB is enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68792">Bug 68792</a> - Problems during playback of h264 files using UVD and VLC on AMD E-350 CPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70042">Bug 70042</a> - Major texture flickering in Dota 2 (r600g on HD 6950)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70088">Bug 70088</a> - Glamor on r600g crashes Xserver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70123">Bug 70123</a> - Freeze caused by 'winsys/radeon: remove cs_queue_empty' commit</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70327">Bug 70327</a> - Casting floating point variable to integer not working properly while constant gets converted properly</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=70891">Bug 70891</a> - CL_INVALID_BUILD_OPTIONS results in CL_INVALID_DEVICE when asking for build log</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71022">Bug 71022</a> - configure: error: Expat required for DRI.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71110">Bug 71110</a> - xorg_driver.c:1030:2: error: too many arguments to function ‘DamageUnregister’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=53077">Bug 53077</a> - [IVB] Output error with msaa when both of framebuffer and source color's alpha are not 1</li>
<li>Fix freedreno to compile with recent libdrm.</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-9.2.3..mesa-9.2.4
</pre>
<p>Brian Paul (1):</p>
<ul>
<li>st/mesa: fix GL_FEEDBACK mode inverted Y coordinate bug</li>
</ul>
<p>Paul Berry (2):</p>
<ul>
<li>i965: Fix vertical alignment for multisampled buffers.</li>
<li>glsl: Fix lowering of direct assignment in lower_clip_distance.</li>
</ul>
<p>Rob Clark (17):</p>
<ul>
<li>freedreno/a3xx: fix color inversion on mem->gmem restore</li>
<li>freedreno/a3xx: fix viewport on gmem->mem resolve</li>
<li>freedreno: add debug option to disable scissor optimization</li>
<li>freedreno: update register headers</li>
<li>freedreno/a3xx: some texture fixes</li>
<li>freedreno/a3xx/compiler: fix CMP</li>
<li>freedreno/a3xx/compiler: handle saturate on dst</li>
<li>freedreno/a3xx/compiler: use max_reg rather than file_count</li>
<li>freedreno/a3xx/compiler: cat4 cannot use const reg as src</li>
<li>freedreno: fix segfault when no color buffer bound</li>
<li>freedreno/a3xx/compiler: make compiler errors more useful</li>
<li>freedreno/a3xx/compiler: bit of re-arrange/cleanup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=62142">Bug 62142</a> - Mesa/demo mipmap_limits upside down with running by SOFTWARE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=64323">Bug 64323</a> - Severe misrendering in Left 4 Dead 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66213">Bug 66213</a> - Certain Mesa Demos Rendering Inverted (vertically)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=68838">Bug 68838</a> - GLSL: struct declarations produce a "empty declaration warning" in 9.2</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.