If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A few GLES2 tests tripped over this when using array dereferences to
hit channels on the LHS (see piglit test
glsl-copy-propagation-vector-indexing). We wouldn't find the
ir_dereference_variable, and assume that that meant that it wasn't an
assignment to a scalar/vector, and thus not notice that the variable
had been changed.
Release the old depth region and reference the new one *only* if it has
changed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@intel.com>
Prior to Gen6, we use the GS for breaking down quads, quad-strips,
and line loops. On Gen6, earlier stages already take care of this,
so we never need the GS.
Since this code is likely completely untested, remove it for now.
We can write new code when enabling real geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit b4cbd2b312.
It looked like a safe sanity check. It missed the issue of the start of
the buffer not being at 0, but even that was not enough to explain why
setting the max vertex index caused glyphs to be dropped from the game
'Achron'.
Instead, the issue appears to be related to the use of the vertex bias
and so we would need to re-emit the max-index every time we adjusted the
bias, so re-emitting the relocations and defeating the original
optimisation.
Reported-and-tested-by: Thomas Jones <thomas.jones@utoronto.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35163
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the window is minimized GetClientRect will return zeros.
Instead of creating a 1x1 framebuffer, simply preserve the current window
size, until the window is restored or maximized again.
The earlier change to ensure rendertargets and textures are always
rebound at every command buffer start had the downside of making
successive flushes no longer no-ops, as a command buffer with merely
the rebinding commands were being unnecessarily sent to the vGPU.
This change only re-emits the bindings when necessary, by keeping track
of the need to rebind outside of the dirty state update mechanism.
According to https://bugs.freedesktop.org/show_bug.cgi?id=34280
commit 5d1387b2da causes the font corruption
problems people have been seeing under various apps and gnome-shell on r200
cards.
This commit changed (loosened) the check for using the memcpy path in the
former al88 / al1616 texstore functions, which are now also used to
store rg texures. This patch restores the old strict check in case of
al textures. I've no idea why this fixes things, since I don't know the
code in question at all. But after seeing the bisect in bfdo34280 point
to this commit, I gave this fix a try and it fixes the font issues seen on
r200 cards.
[airlied:
r200 has no native working A8, so it does an internal storage format of AL88
however srcFormat == internalFormat == ALPHA when we get to this point,
so it copies, but it wants to store into an AL88 not ALPHA so fails,
I'll also push a piglit test for this on r200].
Many thanks to Nicolas Kaiser who did all the hard work of tracking this down!
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also make the GL_ARB_draw_instanced block follow the same pattern as
the other blocks.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a 49.6% +/- 2.0% (n=9, IPS outlier removed) performance
improvement for the hacked-up-for-cache-misses scissor-many, and no
statistically significant performance difference for the
hacked-up-for-cache-hits version (n=9, IPS outlier removed). No
statistically significant performance difference from ETQW (n=5) from
these last two commits.
This is a 28.1% +/- 1.4% (n=10) performance improvement for the
hacked-up-for-cache-misses scissor-many (n=10), and no statistically
significant wall-time performance difference for the
hacked-up-for-cache-hits version (n=9, first outlier in each removed
since IPS was warming up. User time increased by about 4.7%, but
kernel time decreased equivalently).
Build the new sources, plug the new functions into the dispatch table,
implement display list support. And enable extension in the gallium
state tracker.
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
Fixes issues in e.g. nexuiz (desertfactory) or supertuxkart that
look like lighting bugs.
They're not visible with the software rasterizers because their
notion of linear interpolation seems to be different from that
of nv50/nvc0.
This reverts commit 437c748bf5.
The commit is wrong for several reasons. One of them is when we grab
a new buffer, we should update all the states it is bound in,
including all parallel contexts. I don't think this is even doable.
The correct solution would be upload data via a temporary buffer and
do resource_copy_region to the original one.
https://bugs.freedesktop.org/show_bug.cgi?id=36088
Add Cygwin platform-specific settings and drivers to build for dri driver:
- by default, disable direct rendering.
- if direct rendering is enabled, the swrast dridriver is the only one it's
sensible to try to build (this doesn't work at the moment as additional patches
are required to build a libGL which can load just swrast without the DRM headers,
even though there's no actual functional dependency)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Fix build when configured --with-driver=dri --disable-driglx-direct on targets
without drm e.g. GNU/Hurd and Cygwin
Based on the Debian patch file '05_hurd-ftbfs.diff' by Samuel Thibault.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-By: Jakob Bornecrantz <wallbraker@gmail.com>
The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.
Fixes glsl-fs-loop-redundant-condition.
Otherwise min_lod can potentially be larger than the clamped max_lod. The
code that follows will swap min_lod and max_lod in that case, resulting in a
max_lod larger than MAX_LEVEL.
Signed-off-by: Brian Paul <brianp@vmware.com>
Base level and min LOD aren't equivalent. In particular, min LOD has no
effect on image array selection for magnification and non-mipmapped
minification.
Signed-off-by: Brian Paul <brianp@vmware.com>
Although for GL a zero stride means tightly packed elements, Mesa
internally uses zero strides for constant arrays.
Therefore user buffers need to be defined from
buffer_offset + src_offset + min_index*stride
to
buffer_offset + src_offset + max_index*stride + elem_size
Simplifying the later with (max_index + 1)*stride will give zero
sized buffers.
This change also aggregates the st_context's info about user buffers
into a single array.
We adjust 'end' to fit into _MaxElement, but that may result into a 'start'
value bigger than 'end' being passed downstream, causing havoc.
This could be seen with arb_robustness_draw-vbo-bounds, due to an
application bug.
Because:
- bindings are not fully automatic, and they are broken most of the time
- unit tests/samples can be written in C on top of graw
- tracing/retracing is more useful at API levels with stable ABIs such as
GL, producing traces that cover more layers of the driver stack and and
can be used for regression testing
There's really no need for a prefix on member functions, and overloading
takes care of the _op1/_op2 distinction quite nicely. Eric already made
a similar change in the i965 FS backend.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than ir_to_mesa_dst_reg_from_src and ir_to_mesa_src_reg_from_dst.
The new constructors are marked 'explicit' so that the compiler can
catch cases where source and destination registers were accidentally
interchanged.
This also necessitated using constructors to initialize the undef and
address registers, as well as adding a default constructor.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both classes are completely private to ir_to_mesa.cpp, so there won't be
any name conflicts with other parts of Mesa. The prefix simply makes it
harder to read.
Also, use a class rather than typedef structs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is in preparation from removing the "ir_to_mesa_" prefix on the
src_reg and dst_reg types, which would cause a naming conflict.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code would previously handle the projection, then swizzle the
shadow comparitor into place. However, when the projection is done
"by hand," as in the TXB case, the unprojected shadow comparitor would
over-write the projected shadow comparitor.
Shadow comparison with projection and LOD is an extremely rare case in
real application code, so it shouldn't matter that we don't handle
that case with the greatest efficiency.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=32395
This matches the behaviour below when numSamples is compared.
At least with the gallium state tracker this can actually occur if st_render_texture fails.
Signed-off-by: Brian Paul <brianp@vmware.com>
This is always the way with real hardware and desktop OpenGL. Some
hardware can't do some formats natively. The alpha-only, luminance,
and intensity formats are usually the most problematic. Some sized
formats can also be problematic. This patch provides fall-back
formats for those that are not natively supported.
At some point it would be interesting to try providing
device-independent conversions using EXT_texture_swizzle. The drivers
that support EXT_texture_swizzle could, for example, see
GL_LUMINANCE16_SNORM as MESA_FORMAT_SIGNED_R16 with a { r, r, r, 1 }
swizzle. Care would need to be taken to prevent issues with using
those textures for FBO rendering.
This is the rest of the fix for glean's pixelFormats test on i965.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This class of hardware can natively sample all of the snorm surface
formats that DX10 requires, but it can't do some of the legacy GL
formats. In particular, all of the alpha, luminance, and intensity
formats are unsupported.
This partially fixes the breakage in glean's pixelFormats test since
GL_EXT_texture_snorm support was added to Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use this format to represent the accum buffer. No snorm texture
sampling or rendering takes place.
Fixes failed assertion with swrast and any app using the accum buffer
(and glxinfo).
Piglit tests:
- glsl-fs-shadow2d-01
- glsl-fs-shadow2d-02
- glsl-fs-shadow2d-03
- fs-shadow2d-red-01
- fs-shadow2d-red-02
- fs-shadow2d-red-03
NOTE: This is a candidate for the stable branches.
Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.
The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.
NOTE: This is a candidate for the 7.10 branch.
Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
running gnome-shell or compiz (Mesa bugs #35820 and #35853).
I incorrectly refactored the case that dealt with ARF_NULL; even in that
case, the source register needs to be changed to the MRF.
NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
cherry-picked, take this one too).
Branch emulation and loop unrolling are done in the GLSL frontend.
Transforming loops is no longer needed for fragment shaders, but it is still
necessary for vertex shaders.
Registers that are used inside of loops need to be considered live
starting with the first instruction of the outermost loop.
https://bugs.freedesktop.org/show_bug.cgi?id=34370
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Oops, the mask was being used in the loop to determine whether to use
include the stencil || depth values. This began to fail when mask was
cleared at the beginning of the loop. So reorder the tests and do the
work up-front along with determining the depth_stencil value to use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35822
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(some little changes by Marek Olšák)
Squashed commit of the following:
commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Fri Aug 27 02:13:57 2010 +0200
draw: disable SSE and PPC paths (use LLVM instead)
These paths don't support vertex clamping, and are anyway
obsoleted by LLVM.
If you want to re-enable them, add vertex clamping and test that it
works with the ARB_color_buffer_float piglit tests.
commit fed3486a7ca0683b403913604a26ee49a3ef48c7
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:27:38 2010 +0200
draw_llvm: respect vertex color clamp
commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:26:43 2010 +0200
draw: respect vertex clamping in interpreter path
Macro can lead to hard to debug list bugs. For instance consider
the following :
LIST_ADD(item, list->prev)
3 instruction of the macro became :
(list->prev)->next->prev = item
which is equivalent to :
list->prev = item
Thus list prev field changes and next instruction in the macro
(list->prev)->next = item
became :
item->next = item
And you endup with list corruption, other case lead to similar
list corruption. Inline function are not affected by this short
coming
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When the condition
min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])
was true, we were wrongly ignoring istart and processing indices 0.
Reorder some statements to make the code easier to understand.
Now that we purposefully generate delta that point outside of the target
buffer, the assertion has outlived its usefulness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more! This time without the unwarranted conversion from
drm_intel_bo_alloc_tiled.
Signed-off-by: [a very embarrassed] Chris Wilson <chris@chris-wilson.co.uk>
v2: Allocate the fences from a single shared buffer object.
v3: Allocate the r600_fence structs in blocks of 16.
Spin a few times before calling sched_yield in r600_fence_finish().
This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.
This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This array is going to be used in the main compiler soon. Leaving
them uniforms.c caused problems for building the stand-alone compiler.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The hardware should be set according to this table:
FORMAT -> R300 COLORFORMAT
-------------------------
X16 -> UV88
X16Y16 -> ARGB8888
X32 -> ARGB8888
X32Y32 -> ARGB16161616
US_OUT_FMT must contain the real format.
I wasn't able to make B3G3R2 and L4A4 work, but those aren't important.
The vertex color clamp control is a property of an API,
a lot like gl_rasterization_rules.
The state should be set according to the API being implemented, for example:
OpenGL Compatibility: enabled by default
OpenGL Core: disabled by default
D3D11: always disabled
This patch also changes the way ARB_color_buffer_float is advertised.
If no SNORM or FLOAT render target is supported, fragment color clamping
is not required.
BTW this changes the gallium interface.
Some rather cosmetic changes by Marek.
Squashed commit of the following:
commit 513b37d484f0318311e84bb86ed4c93cdff71f13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:54 2010 +0200
mesa/st: respect fragment clamping in st_DrawPixels
commit 546a31e42cad459d7a7a10ebf77fc5ffcf89e9b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:28 2010 +0200
mesa/st: support fragment and vertex color clamping
commit c406514a1fbee6891da4cf9ac3eebe4e4407ec13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:56:37 2010 +0200
mesa/st: expose ARB_color_buffer_float if unclamping is supported
commit d0c5ea11b6f75f3da2f4ca989115f150ebc7cf8d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:53:41 2010 +0200
mesa/st: use unclamped colors
This assumes that Gallium is to be interpreted as given drivers the
responsibility to clamp these colors if necessary.
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it. (see the removed XXX comment. -Marek)
TODO: did I get the set of operations mandating it right?
commit 76bdfcfe3ff4145a1818e6cb6e227b730a5f12d8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:18:25 2010 +0200
gallium: add color clamping to the interface
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: fix getteximage so that it doesn't clamp values
mesa: update the compute_version function
mesa: add display list support for ARB_color_buffer_float
mesa: fix glGet query with GL_ALPHA_TEST_REF and ARB_color_buffer_float
commit b2f6ddf907935b2594d2831ddab38cf57a1729ce
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 31 16:50:57 2010 +0200
mesa: document known possible deviations from ARB_color_buffer_float
commit 5458935be800c1b19d1c9d1569dc4fa30a97e8b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:54:56 2010 +0200
mesa: expose GL_ARB_color_buffer_float
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
(I'll squash the st/mesa part to a separate commit. -Marek)
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it.
TODO: did I get the set of operations mandating it right?
commit 3a9cb5e59b676b6148c50907ce6eef5441677e36
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:09:41 2010 +0200
mesa: respect color clamping in texenv programs (v2)
Changes in v2:
- Fix attributes other than vertex color sometimes getting clamped
commit de26f9e47e886e176aab6e5a2c3d4481efb64362
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:05:53 2010 +0200
mesa: restore color clamps on glPopAttrib
commit a55ac3c300c189616627c05d924c40a8b55bfafa
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:04:26 2010 +0200
mesa: clamp color queries if and only if fragment clamping is enabled
commit 9940a3e31c2fb76cc3d28b15ea78dde369825107
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 00:00:16 2010 +0200
mesa: introduce derived _ClampXxxColor state resolving FIXED_ONLY
To do this, we make ClampColor call FLUSH_VERTICES with the appropriate
_NEW flag.
We introduce _NEW_FRAG_CLAMP since fragment clamping has wide-ranging
effects, despite being in the Color attrib group.
This may be easily changed by s/_NEW_FRAG_CLAMP/_NEW_COLOR/g
commit 6244c446e3beed5473b4e811d10787e4019f59d6
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:58:24 2010 +0200
mesa: add unclamped color parameters
Fixes piglit test glsl-vs-arrays-3 on Sandybridge, as well as garbage
rendering in 3DMarkMobileES 2.0's Taiji demo and GLBenchmark 2.0's
Egypt and PRO demos.
NOTE: This a candidate for stable release branches. It depends on
commit 9a21bc6401.
This was open-coded in three different places, and more are necessary.
Extract this into a function so it can be reused.
Unfortunately, not all variations were the same: in particular, one set
compression control and checked that the source register was not
ARF_NULL. This seemed like a good idea, so all cases now do so.
Eventually the miptree refcounting interface should be cleaned up.
The assymmetry dramatically increases the probability of bugs like
this. It should be made to like like libdrm refcounting or the
refcounting style used in other parts of Mesa.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, this ensures things like the front buffer actually exist. This
fixes piglt fbo/fbo-sys-blit and fd.o bug 35483.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
GLSL 1.30 states clearly that only float and int are allowed, while the
GLSL ES specification's issues section states that sampler types may
take precision qualifiers.
Fixes compilation failures in 3DMarkMobileES 2.0 and GLBenchmark 2.0.
NOTE: This is a candidate for stable release branches.
Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.
Found by inspection (comparing the old and new FS backend code).
References: https://bugs.freedesktop.org/show_bug.cgi?id=32949
NOTE: This is a candidate for the 7.10 branch.
Since GLSL IR allows multiple ir_variables to share the same name, we
need to generate unique names when printing the IR. Previously, we
always used %s@%p, appending the ir_variable's memory address.
While this worked, it had two drawbacks:
- When there aren't duplicates, the extra "@0x669a3e88" is useless
and makes the code harder to read.
- Real duplicates were hard to tell apart:
channel_expressions@0x6699e3c8 vs. channel_expressions@0x6699ddd8
We now append @2, @3, @4, and so on, but only where necessary to
distinguish duplicates. Since we only do this at print time, any
performance impact is irrelevant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
While making some other changes in this area I was finding it annoying
each of these functions took mostly the same set of parameters in
differing orders.
'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
We were using typecasts because the functions pointers are polymorphic
in the second argument type, which.
Surprisingly the wrong calling convention didn't cause crashes on Windows,
but it was causing certain registers to be trashed in MSVC optimized
builds, when processing callists in the ClearView RC Flight Simulator.
I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
If set to year X, only report extensions up to that year. This is a
work-around for games that try to copy the extensions string to a fixed
size buffer and overflow. If a game was released in year X, setting
MESA_EXTENSION_MAX_YEAR to that year will likely fix the problem.
We now use a 4-bit writemask for all instruction types, which makes it
easier to write generic helper functions to manipulte writemasks.
NOTE: This is a candidate for the 7.10 branch.
The code no longer supports otherwise -- it relies on buffers being
uploaded via u_upload_mgr -- so make this clear.
Also, there's no need to flush after draws from user buffers, given all
user content should have been copied by then.
Unsigned long is 32bit on several platforms (e.g., Windows), yielding
1UL << 32 to be zero.
Note that BITFIELD64_BIT result is often assigned to variables of type
GLbitfield, instead of GLbitfield64. That's probably wrong and should be
addressed in a later change.
2011-03-16 09:15:30 +00:00
1130 changed files with 84617 additions and 107365 deletions
Known issues in the ARB_color_buffer_float implementation:
- Rendering to multiple render targets, some fixed-point, some floating-point, with FIXED_ONLY fragment clamping and polygon smooth enabled may write incorrect values to the fixed point buffers (depends on spec interpretation)
- For fragment programs with ARB_fog_* options, colors are clamped before fog application regardless of the fragment clamping setting (this depends on spec interpretation)
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34370">Bug 34370</a> - [GLSL] "i<5 && i<4" in for loop fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34374">Bug 34374</a> - [GLSL] fail to redeclare an array using initializer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35073">Bug 35073</a> - [GM45] Alpha test is broken when rendering to FBO with no color attachment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35483">Bug 35483</a> - util_blit_pixels_writemask: crash in line 322 of src/gallium/auxiliary/util/u_blit.c</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33512">Bug 33512</a> - [SNB] case ogles2conform/GL/gl_FragCoord/gl_FragCoord_xy_frag.test and gl_FragCoord_w_frag.test fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34280">Bug 34280</a> - r200 mesa-7.10 font distortion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34321">Bug 34321</a> - The ARB_fragment_program subset of ARB_draw_buffers not implemented</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36410">Bug 36410</a> - [SNB] Rendering errors in 3DMMES subtest taiji</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36527">Bug 36527</a> - [wine] Wolfenstein: Failed to translate rgb instruction.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36651">Bug 36651</a> - mesa requires bison and flex to build but configure does not check for them</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.