If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A few GLES2 tests tripped over this when using array dereferences to
hit channels on the LHS (see piglit test
glsl-copy-propagation-vector-indexing). We wouldn't find the
ir_dereference_variable, and assume that that meant that it wasn't an
assignment to a scalar/vector, and thus not notice that the variable
had been changed.
Release the old depth region and reference the new one *only* if it has
changed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@intel.com>
Prior to Gen6, we use the GS for breaking down quads, quad-strips,
and line loops. On Gen6, earlier stages already take care of this,
so we never need the GS.
Since this code is likely completely untested, remove it for now.
We can write new code when enabling real geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit b4cbd2b312.
It looked like a safe sanity check. It missed the issue of the start of
the buffer not being at 0, but even that was not enough to explain why
setting the max vertex index caused glyphs to be dropped from the game
'Achron'.
Instead, the issue appears to be related to the use of the vertex bias
and so we would need to re-emit the max-index every time we adjusted the
bias, so re-emitting the relocations and defeating the original
optimisation.
Reported-and-tested-by: Thomas Jones <thomas.jones@utoronto.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35163
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the window is minimized GetClientRect will return zeros.
Instead of creating a 1x1 framebuffer, simply preserve the current window
size, until the window is restored or maximized again.
The earlier change to ensure rendertargets and textures are always
rebound at every command buffer start had the downside of making
successive flushes no longer no-ops, as a command buffer with merely
the rebinding commands were being unnecessarily sent to the vGPU.
This change only re-emits the bindings when necessary, by keeping track
of the need to rebind outside of the dirty state update mechanism.
According to https://bugs.freedesktop.org/show_bug.cgi?id=34280
commit 5d1387b2da causes the font corruption
problems people have been seeing under various apps and gnome-shell on r200
cards.
This commit changed (loosened) the check for using the memcpy path in the
former al88 / al1616 texstore functions, which are now also used to
store rg texures. This patch restores the old strict check in case of
al textures. I've no idea why this fixes things, since I don't know the
code in question at all. But after seeing the bisect in bfdo34280 point
to this commit, I gave this fix a try and it fixes the font issues seen on
r200 cards.
[airlied:
r200 has no native working A8, so it does an internal storage format of AL88
however srcFormat == internalFormat == ALPHA when we get to this point,
so it copies, but it wants to store into an AL88 not ALPHA so fails,
I'll also push a piglit test for this on r200].
Many thanks to Nicolas Kaiser who did all the hard work of tracking this down!
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also make the GL_ARB_draw_instanced block follow the same pattern as
the other blocks.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a 49.6% +/- 2.0% (n=9, IPS outlier removed) performance
improvement for the hacked-up-for-cache-misses scissor-many, and no
statistically significant performance difference for the
hacked-up-for-cache-hits version (n=9, IPS outlier removed). No
statistically significant performance difference from ETQW (n=5) from
these last two commits.
This is a 28.1% +/- 1.4% (n=10) performance improvement for the
hacked-up-for-cache-misses scissor-many (n=10), and no statistically
significant wall-time performance difference for the
hacked-up-for-cache-hits version (n=9, first outlier in each removed
since IPS was warming up. User time increased by about 4.7%, but
kernel time decreased equivalently).
Build the new sources, plug the new functions into the dispatch table,
implement display list support. And enable extension in the gallium
state tracker.
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
Fixes issues in e.g. nexuiz (desertfactory) or supertuxkart that
look like lighting bugs.
They're not visible with the software rasterizers because their
notion of linear interpolation seems to be different from that
of nv50/nvc0.
This reverts commit 437c748bf5.
The commit is wrong for several reasons. One of them is when we grab
a new buffer, we should update all the states it is bound in,
including all parallel contexts. I don't think this is even doable.
The correct solution would be upload data via a temporary buffer and
do resource_copy_region to the original one.
https://bugs.freedesktop.org/show_bug.cgi?id=36088
Add Cygwin platform-specific settings and drivers to build for dri driver:
- by default, disable direct rendering.
- if direct rendering is enabled, the swrast dridriver is the only one it's
sensible to try to build (this doesn't work at the moment as additional patches
are required to build a libGL which can load just swrast without the DRM headers,
even though there's no actual functional dependency)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Fix build when configured --with-driver=dri --disable-driglx-direct on targets
without drm e.g. GNU/Hurd and Cygwin
Based on the Debian patch file '05_hurd-ftbfs.diff' by Samuel Thibault.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-By: Jakob Bornecrantz <wallbraker@gmail.com>
The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.
Fixes glsl-fs-loop-redundant-condition.
Otherwise min_lod can potentially be larger than the clamped max_lod. The
code that follows will swap min_lod and max_lod in that case, resulting in a
max_lod larger than MAX_LEVEL.
Signed-off-by: Brian Paul <brianp@vmware.com>
Base level and min LOD aren't equivalent. In particular, min LOD has no
effect on image array selection for magnification and non-mipmapped
minification.
Signed-off-by: Brian Paul <brianp@vmware.com>
Although for GL a zero stride means tightly packed elements, Mesa
internally uses zero strides for constant arrays.
Therefore user buffers need to be defined from
buffer_offset + src_offset + min_index*stride
to
buffer_offset + src_offset + max_index*stride + elem_size
Simplifying the later with (max_index + 1)*stride will give zero
sized buffers.
This change also aggregates the st_context's info about user buffers
into a single array.
We adjust 'end' to fit into _MaxElement, but that may result into a 'start'
value bigger than 'end' being passed downstream, causing havoc.
This could be seen with arb_robustness_draw-vbo-bounds, due to an
application bug.
Because:
- bindings are not fully automatic, and they are broken most of the time
- unit tests/samples can be written in C on top of graw
- tracing/retracing is more useful at API levels with stable ABIs such as
GL, producing traces that cover more layers of the driver stack and and
can be used for regression testing
There's really no need for a prefix on member functions, and overloading
takes care of the _op1/_op2 distinction quite nicely. Eric already made
a similar change in the i965 FS backend.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than ir_to_mesa_dst_reg_from_src and ir_to_mesa_src_reg_from_dst.
The new constructors are marked 'explicit' so that the compiler can
catch cases where source and destination registers were accidentally
interchanged.
This also necessitated using constructors to initialize the undef and
address registers, as well as adding a default constructor.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both classes are completely private to ir_to_mesa.cpp, so there won't be
any name conflicts with other parts of Mesa. The prefix simply makes it
harder to read.
Also, use a class rather than typedef structs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is in preparation from removing the "ir_to_mesa_" prefix on the
src_reg and dst_reg types, which would cause a naming conflict.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code would previously handle the projection, then swizzle the
shadow comparitor into place. However, when the projection is done
"by hand," as in the TXB case, the unprojected shadow comparitor would
over-write the projected shadow comparitor.
Shadow comparison with projection and LOD is an extremely rare case in
real application code, so it shouldn't matter that we don't handle
that case with the greatest efficiency.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=32395
This matches the behaviour below when numSamples is compared.
At least with the gallium state tracker this can actually occur if st_render_texture fails.
Signed-off-by: Brian Paul <brianp@vmware.com>
This is always the way with real hardware and desktop OpenGL. Some
hardware can't do some formats natively. The alpha-only, luminance,
and intensity formats are usually the most problematic. Some sized
formats can also be problematic. This patch provides fall-back
formats for those that are not natively supported.
At some point it would be interesting to try providing
device-independent conversions using EXT_texture_swizzle. The drivers
that support EXT_texture_swizzle could, for example, see
GL_LUMINANCE16_SNORM as MESA_FORMAT_SIGNED_R16 with a { r, r, r, 1 }
swizzle. Care would need to be taken to prevent issues with using
those textures for FBO rendering.
This is the rest of the fix for glean's pixelFormats test on i965.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This class of hardware can natively sample all of the snorm surface
formats that DX10 requires, but it can't do some of the legacy GL
formats. In particular, all of the alpha, luminance, and intensity
formats are unsupported.
This partially fixes the breakage in glean's pixelFormats test since
GL_EXT_texture_snorm support was added to Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use this format to represent the accum buffer. No snorm texture
sampling or rendering takes place.
Fixes failed assertion with swrast and any app using the accum buffer
(and glxinfo).
Piglit tests:
- glsl-fs-shadow2d-01
- glsl-fs-shadow2d-02
- glsl-fs-shadow2d-03
- fs-shadow2d-red-01
- fs-shadow2d-red-02
- fs-shadow2d-red-03
NOTE: This is a candidate for the stable branches.
Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.
The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.
NOTE: This is a candidate for the 7.10 branch.
Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
running gnome-shell or compiz (Mesa bugs #35820 and #35853).
I incorrectly refactored the case that dealt with ARF_NULL; even in that
case, the source register needs to be changed to the MRF.
NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
cherry-picked, take this one too).
Branch emulation and loop unrolling are done in the GLSL frontend.
Transforming loops is no longer needed for fragment shaders, but it is still
necessary for vertex shaders.
Registers that are used inside of loops need to be considered live
starting with the first instruction of the outermost loop.
https://bugs.freedesktop.org/show_bug.cgi?id=34370
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Oops, the mask was being used in the loop to determine whether to use
include the stencil || depth values. This began to fail when mask was
cleared at the beginning of the loop. So reorder the tests and do the
work up-front along with determining the depth_stencil value to use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35822
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(some little changes by Marek Olšák)
Squashed commit of the following:
commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Fri Aug 27 02:13:57 2010 +0200
draw: disable SSE and PPC paths (use LLVM instead)
These paths don't support vertex clamping, and are anyway
obsoleted by LLVM.
If you want to re-enable them, add vertex clamping and test that it
works with the ARB_color_buffer_float piglit tests.
commit fed3486a7ca0683b403913604a26ee49a3ef48c7
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:27:38 2010 +0200
draw_llvm: respect vertex color clamp
commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:26:43 2010 +0200
draw: respect vertex clamping in interpreter path
Macro can lead to hard to debug list bugs. For instance consider
the following :
LIST_ADD(item, list->prev)
3 instruction of the macro became :
(list->prev)->next->prev = item
which is equivalent to :
list->prev = item
Thus list prev field changes and next instruction in the macro
(list->prev)->next = item
became :
item->next = item
And you endup with list corruption, other case lead to similar
list corruption. Inline function are not affected by this short
coming
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When the condition
min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])
was true, we were wrongly ignoring istart and processing indices 0.
Reorder some statements to make the code easier to understand.
Now that we purposefully generate delta that point outside of the target
buffer, the assertion has outlived its usefulness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more! This time without the unwarranted conversion from
drm_intel_bo_alloc_tiled.
Signed-off-by: [a very embarrassed] Chris Wilson <chris@chris-wilson.co.uk>
v2: Allocate the fences from a single shared buffer object.
v3: Allocate the r600_fence structs in blocks of 16.
Spin a few times before calling sched_yield in r600_fence_finish().
This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.
This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This array is going to be used in the main compiler soon. Leaving
them uniforms.c caused problems for building the stand-alone compiler.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The hardware should be set according to this table:
FORMAT -> R300 COLORFORMAT
-------------------------
X16 -> UV88
X16Y16 -> ARGB8888
X32 -> ARGB8888
X32Y32 -> ARGB16161616
US_OUT_FMT must contain the real format.
I wasn't able to make B3G3R2 and L4A4 work, but those aren't important.
The vertex color clamp control is a property of an API,
a lot like gl_rasterization_rules.
The state should be set according to the API being implemented, for example:
OpenGL Compatibility: enabled by default
OpenGL Core: disabled by default
D3D11: always disabled
This patch also changes the way ARB_color_buffer_float is advertised.
If no SNORM or FLOAT render target is supported, fragment color clamping
is not required.
BTW this changes the gallium interface.
Some rather cosmetic changes by Marek.
Squashed commit of the following:
commit 513b37d484f0318311e84bb86ed4c93cdff71f13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:54 2010 +0200
mesa/st: respect fragment clamping in st_DrawPixels
commit 546a31e42cad459d7a7a10ebf77fc5ffcf89e9b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:28 2010 +0200
mesa/st: support fragment and vertex color clamping
commit c406514a1fbee6891da4cf9ac3eebe4e4407ec13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:56:37 2010 +0200
mesa/st: expose ARB_color_buffer_float if unclamping is supported
commit d0c5ea11b6f75f3da2f4ca989115f150ebc7cf8d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:53:41 2010 +0200
mesa/st: use unclamped colors
This assumes that Gallium is to be interpreted as given drivers the
responsibility to clamp these colors if necessary.
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it. (see the removed XXX comment. -Marek)
TODO: did I get the set of operations mandating it right?
commit 76bdfcfe3ff4145a1818e6cb6e227b730a5f12d8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:18:25 2010 +0200
gallium: add color clamping to the interface
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: fix getteximage so that it doesn't clamp values
mesa: update the compute_version function
mesa: add display list support for ARB_color_buffer_float
mesa: fix glGet query with GL_ALPHA_TEST_REF and ARB_color_buffer_float
commit b2f6ddf907935b2594d2831ddab38cf57a1729ce
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 31 16:50:57 2010 +0200
mesa: document known possible deviations from ARB_color_buffer_float
commit 5458935be800c1b19d1c9d1569dc4fa30a97e8b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:54:56 2010 +0200
mesa: expose GL_ARB_color_buffer_float
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
(I'll squash the st/mesa part to a separate commit. -Marek)
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it.
TODO: did I get the set of operations mandating it right?
commit 3a9cb5e59b676b6148c50907ce6eef5441677e36
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:09:41 2010 +0200
mesa: respect color clamping in texenv programs (v2)
Changes in v2:
- Fix attributes other than vertex color sometimes getting clamped
commit de26f9e47e886e176aab6e5a2c3d4481efb64362
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:05:53 2010 +0200
mesa: restore color clamps on glPopAttrib
commit a55ac3c300c189616627c05d924c40a8b55bfafa
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:04:26 2010 +0200
mesa: clamp color queries if and only if fragment clamping is enabled
commit 9940a3e31c2fb76cc3d28b15ea78dde369825107
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 00:00:16 2010 +0200
mesa: introduce derived _ClampXxxColor state resolving FIXED_ONLY
To do this, we make ClampColor call FLUSH_VERTICES with the appropriate
_NEW flag.
We introduce _NEW_FRAG_CLAMP since fragment clamping has wide-ranging
effects, despite being in the Color attrib group.
This may be easily changed by s/_NEW_FRAG_CLAMP/_NEW_COLOR/g
commit 6244c446e3beed5473b4e811d10787e4019f59d6
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:58:24 2010 +0200
mesa: add unclamped color parameters
Fixes piglit test glsl-vs-arrays-3 on Sandybridge, as well as garbage
rendering in 3DMarkMobileES 2.0's Taiji demo and GLBenchmark 2.0's
Egypt and PRO demos.
NOTE: This a candidate for stable release branches. It depends on
commit 9a21bc6401.
This was open-coded in three different places, and more are necessary.
Extract this into a function so it can be reused.
Unfortunately, not all variations were the same: in particular, one set
compression control and checked that the source register was not
ARF_NULL. This seemed like a good idea, so all cases now do so.
Eventually the miptree refcounting interface should be cleaned up.
The assymmetry dramatically increases the probability of bugs like
this. It should be made to like like libdrm refcounting or the
refcounting style used in other parts of Mesa.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, this ensures things like the front buffer actually exist. This
fixes piglt fbo/fbo-sys-blit and fd.o bug 35483.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
GLSL 1.30 states clearly that only float and int are allowed, while the
GLSL ES specification's issues section states that sampler types may
take precision qualifiers.
Fixes compilation failures in 3DMarkMobileES 2.0 and GLBenchmark 2.0.
NOTE: This is a candidate for stable release branches.
Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.
Found by inspection (comparing the old and new FS backend code).
References: https://bugs.freedesktop.org/show_bug.cgi?id=32949
NOTE: This is a candidate for the 7.10 branch.
Since GLSL IR allows multiple ir_variables to share the same name, we
need to generate unique names when printing the IR. Previously, we
always used %s@%p, appending the ir_variable's memory address.
While this worked, it had two drawbacks:
- When there aren't duplicates, the extra "@0x669a3e88" is useless
and makes the code harder to read.
- Real duplicates were hard to tell apart:
channel_expressions@0x6699e3c8 vs. channel_expressions@0x6699ddd8
We now append @2, @3, @4, and so on, but only where necessary to
distinguish duplicates. Since we only do this at print time, any
performance impact is irrelevant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
While making some other changes in this area I was finding it annoying
each of these functions took mostly the same set of parameters in
differing orders.
'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
We were using typecasts because the functions pointers are polymorphic
in the second argument type, which.
Surprisingly the wrong calling convention didn't cause crashes on Windows,
but it was causing certain registers to be trashed in MSVC optimized
builds, when processing callists in the ClearView RC Flight Simulator.
I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
If set to year X, only report extensions up to that year. This is a
work-around for games that try to copy the extensions string to a fixed
size buffer and overflow. If a game was released in year X, setting
MESA_EXTENSION_MAX_YEAR to that year will likely fix the problem.
We now use a 4-bit writemask for all instruction types, which makes it
easier to write generic helper functions to manipulte writemasks.
NOTE: This is a candidate for the 7.10 branch.
The code no longer supports otherwise -- it relies on buffers being
uploaded via u_upload_mgr -- so make this clear.
Also, there's no need to flush after draws from user buffers, given all
user content should have been copied by then.
Unsigned long is 32bit on several platforms (e.g., Windows), yielding
1UL << 32 to be zero.
Note that BITFIELD64_BIT result is often assigned to variables of type
GLbitfield, instead of GLbitfield64. That's probably wrong and should be
addressed in a later change.
It should have been a tip when the spec says "However, implicitly
sized arrays cannot be assigned to. Note, this is a rare case that
*initializers and assignments appear to have different semantics*."
(empahsis mine)
Fixes bugzilla #34367.
NOTE: This is a candidate for stable release branches.
Early Z support is set in the DST_VARS command. Hence split up static
state emission to avoid reissuing to much on fragment shader changes,
especially the costly dst buffer relocations.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The optimization loop won't reinsert noise instructions or quadop
vectors, so we were traversing the tree for nothing. Lowering vector
indexing was in the loop after do_common_optimization() to avoid the
work if it ended up that the index was actually constant, but that has
been called already in the core.
Most of the time we don't have a non-uniform struct variable in the
shader, so this cuts the time spent in do_structure_splitting during
glean texCombine by about 2/3.
This fixes an issue where the .obj files wound up in the src/
directory rather than the build/ directory. That prevented
combined 32-bit and 64-bit builds from working.
Signed-off-by: Brian Paul <brianp@vmware.com>
Additionally, to discarding the whole buffer, use
PIPE_TRANSFER_DISCARD_RANGE in pipe_buffer_write when the
write covers only part of the buffer.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In memory mapping buffer objects make use of
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE and PIPE_TRANSFER_DISCARD_RANGE
when appropriate.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It doesn't support them. Also, we shouldn't be
emitting CB_BLENDx_CONTROL on R600 as the regs don't
exist there, but I'm not sure of the best way to deal
with this in the current r600 winsys.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This packet is required when updating the DB, CB,
or STRMOUT base addresses on rv6xx for the surface
sync logic to work correctly.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This sort of worked because blend state setup cleared MULTIWRITE_ENABLE again,
but that's not something we want to depend on.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Disable Z_EXPORT / STENCIL_EXPORT / KILL_ENABLE again if a shader doesn't
use those. This is similar to 0a6f09a76a.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The idea behind this is that anything touching registers should be in
r600_state.c or evergreen_state.c. This is also consistent with
evergreen_pipe_shader_vs().
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
We need more and more of these, and it is difficult and prone to version
incompatability issues trying to single out every one of them.
This mimicks what was done in SCons.
The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered
dead by the reassignment on line 392.
Signed-off-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
This was probably missed when implementing luminance and luminance alpha
render targets.
_mesa_get_format_bits checks for both GL_*_BITS and GL_TEXTURE_*_SIZE.
This fixes:
main/framebuffer.c:892: _mesa_source_buffer_exists: Assertion `....' failed.
The r300 compiler can eliminate unused uniforms and remap uniform locations
if their number surpasses hardware limits, so the limit is actually
NumParameters + NumUnusedParameters. This is important for some apps
under Wine to run.
Wine sometimes declares a uniform array of 256 vec4's and some Wine-specific
constants on top of that, so in total there is more uniforms than r300 can
handle. This was the main motivation for implementing the elimination
of unused constants.
We should allow drivers to implement fail & recovery paths where it makes
sense, so giving up too early especially when comes to uniforms is not
so good idea, though I agree there should be some hard limit for all drivers.
This patch fixes:
- glsl-fs-uniform-array-5
- glsl-vs-large-uniform-array
on drivers which can eliminate unused uniforms.
Avoid setting the same gpu register several times in a r600_pipe_state.
Compute the final value of the register and set that one time. This avoids
some overhead in r600_context_pipe_state_set().
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Well, not sure what exactly it is, but it certainly doesn't contain
the control flow stack, but vertex data.
Not sure about size, I've only seen the first few KiB written, but
the binary driver seems to allocate more.
Depth values are also written before the shader is executed, so if
early tests are enabled, fragments that failed the alpha test were
modifying the depth buffer, but they shouldn't.
The kernel drm takes care of all coherency as long as we don't forget
to submit all outstanding commands in the batchbuffer ...
Also move batchbuffer initialization up because otherwise transfers
for some helper textures fail with a segmentation fault.
And kill the dead code, flushes should now be correct everywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The docs say it can be set for direct texture lookups, but even that
causes problems.
This fixes the wireframe bug:
https://bugs.freedesktop.org/show_bug.cgi?id=32688
NOTE: This is a candidate for the 7.9 and 7.10 branches.
(1, -_, ...) was converted to (-1, ...) because of the negation
in the second component.
Masking out the unused bits fixes this.
Piglit:
- glsl-fs-texture2d-branching
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This gets one more piece of the pipeline onto the new codegen backend.
Once ARB_fragment_program can generate GLSL programs, we can nuke the
old backend.
This is like how we track FragmentProgram._Current for the computed
ARB fragment program for fixed function texenv, but this gives direct
access to the gl_shader_program for drivers to codegen from, skipping
ARB_fp.
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
It would be nice if we handled optimized uniform math like this in
some generic way, since people often end up doing uniform expressions
in shaders, but for now keep this hard-coded like it was in the
texenvprogram code.
For fixed function fragment processing in GLSL IR, we want to be able
to reference this state value. gl_* not explicitly permitted is
reserved, so using this variable name internally shouldn't be any
issue.
This is used in the upcoming fixed function shader_program generation,
and shader_program and ARB programs are together in this code until
both fragment and vertex ff get converted.
The drivers have been changed so that they behave as if all of the flags
were set. This is already implicit in most hardware drivers and required
for multiple contexts.
Some state trackers were also abusing the PIPE_FLUSH_RENDER_CACHE flag
to decide whether flush_frontbuffer should be called.
New flag ST_FLUSH_FRONT has been added to st_api.h as a replacement.
Since we're compiling/linking GLSL shaders we should check against
the shader uniform limits, not the legacy vertex/fragment program
parameter limits which are usually lower.
The gl_program_constants struct is for limits that are applicable to
any/all shader stages. Move the geometry shader-only fields into the
gl_constants struct.
Remove redundant MaxGeometryUniformComponents field too.
Need to flush rendering (or at least indicate that the rug might be getting
pulled out from underneath us) when a shader, buffer object or query object
is about to be deleted.
Also, this helps to tell the VBO module to unmap its current vertex buffer.
Benefits:
- spares us a relocation.
- needed for zone rendering (if that ever happens).
- just awesome.
v2: Rename the debug option. Completely disabling the blitter is
required for Y tiling to work, so this option will cover other
code paths in the future.
v3: Implement suggestions by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Flushing the batch/hw backend doesn't invalidate the derived state.
So kill the unnecessary function calls and add an assert in
emit_hardware_state for paranoia.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the new clear code this is possible (if the app call glClear
before drawing the first primitive).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The polygon stipple fallback does not have to be implemented in the
draw module (it doesn't need window coords, etc). Drivers can use
this utility and avoid sw vertex fallbacks if pstipple is enabled.
Note: this is WIP and not used by any driver yet.
The gc var would be NULL if called from line 238. Instead, get
the opcode from __glXSetupForCommand(dpy) as done in other places.
And remove the unused gc parameter.
Fixes a bug reported by "John Doe" on 3/9/2011.
NOTE: This is a candidate for the 7.10 branch.
This fd gets passed in from outside, closing it causes the X.org server
to crap out when the driver doesn't identify the chipset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
Add linear probing on collisions.
Expand entry array by a fixed scale (currently 2) to help avoid
collisions.
Use a LRU approach to ensure that the number of entries stored in the
cache doesn't exceed the requested size.
Do that later on when we set up the hwtnl state instead.
This addresses a problem when we drop the uploaded copy due to a vb
size change, it will remain referenced in svga->curr.vb[], and the
new contents of the vb will never be uploaded.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The encoding/decoding algorithms are shared with RGTC.
Thanks to some magic with the base format, the RGTC texstore functions work
for LATC too.
swrast passes the related piglit tests besides two things:
- The alpha channel is wrong (it's always 1), however the incorrect alpha
channel makes some other tests fail too, so I guess it's unrelated to LATC.
- Signed LATC fetches aren't correct yet (signed values are clamped to [0,1]),
however RGTC has the same problem.
Further testing (with other of my patches) shows that hardware drivers
and softpipe work.
BTW, ETQW uses this extension.
st->user_vb[attr] was always pointing to the same user vb, regardless
of the value of attr. Together with reverting the temporary workaround
for bug 34378, and a fix in the svga driver, this fixes googleearth on svga.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The signature list in a function must contain only ir_function_signature nodes.
The target of an ir_call must be an ir_function_signature.
These were added while trying to debug Mesa bugzilla #34203.
Instead of using the current gl_fragment_program. These aren't necessarily
the same, for example when translate_program() is called by
i915ValidateFragmentProgram().
this doesn't bind to drivers yet, just enough to in theory make indirect
work against other servers.
I'm really not sure what the rules for adding extensions to the known_gl_extensions list as it looks to be missing a few. are these GL extensions that have GLX
protocol??
Signed-off-by: Dave Airlie <airlied@redhat.com>
Comments about the deleted stuff:
- openaren hang: likely caused by the vertex corruptions, fixed by Jakob.
- tiling: Y-tiling works with my hw-clear branch. X-tiling works as
merged to master a while ago (execbuf2 version).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ARB_instanced_arrays is a subset of D3D9.
ARB_draw_instanced is a subset of D3D10.
The point of this change is to allow D3D9-level drivers to enable
ARB_instanced_arrays without ARB_draw_instanced.
If an array redeclaration includes an initializer, the initializer
would previously be dropped on the floor. Instead, directly apply the
initializer to the correct ir_variable instance and append the
generated instructions.
Fixes bugzilla #34374 and piglit tests glsl-{vs,fs}-array-redeclaration.
NOTE: This is a candidate for stable release branches. 0292ffb8 and
8e6cb9fe are also necessary.
Removed sampler view support for USCALED/SSCALED, the texture unit
refuses to convert to non-normalized float. The enums are treated
like UNORM.
Removed duplicate format related headers.
The hw doesn't like it - demos/shadowtex is broken. The emitted shader
isn't totally empty though, the depth write fixup gets emitted instead.
Maybe that one is somewhat fishy, too?
Idea for this patch from Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
https://bugs.freedesktop.org/show_bug.cgi?id=29172
Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper)
Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
Because the format can be changed to UNORM in a surface.
This fixes:
state_tracker/st_atom_framebuffer.c:163:update_framebuffer_state:
Assertion `framebuffer->cbufs[i]->texture->bind & (1 << 1)' failed.
Computation of the delta of this array from the last had a silly little
bug and ignored any initial delta==0 causing grief in Nexuiz and
friends.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is a silicon bug which causes unpredictable behaviour if the
URB_FENCE command should cross a cache-line boundary. Pad before the
command to avoid such occurrences. As this command only applies to
gen4/5, do the fixup unconditionally as the specs do not actually state
for which chip it was fixed (and the cost is negligible)...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, the rule deleted by this commit was matched every single
time (being the longest match). If not skipping, it used REJECT to
continue on to the actual correct rule.
The flex manual advises against using REJECT where possible, as it is
one of the most expensive lexer features. So using it on every match
seems undesirable. Perhaps more importantly, it made it necessary for
the #if directive rules to contain a look-ahead pattern to make them
as long as the (now deleted) "skip the whole line" rule.
This patch introduces an exclusive start state, SKIP, to avoid REJECTs.
Each time the lexer is called, the code at the top of the rules section
will run, implicitly switching the state to the correct one.
Fixes piglit tests 16384-consecutive-chars.frag and
16385-consecutive-chars.frag.
The expected result has been out of sync with what glcpp produces for
some time; glcpp's actual result seems to be correct and is very close to
GCC's cpp. Updating this will make it easier to catch regressions in
upcoming commits.
PIPE_BIND_CONSTANT_BUFFER alone was OK for nv50/nvc0, but nv30 will need
to be able to set others on certain chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit is basically a copy-over of the fix
Chia-I Wu's commited to wayland:
http://cgit.freedesktop.org/wayland/wayland-demos/commit/?id=1b6c0ed95
"Workaround an xcb-dri2 bug.
xcb_dri2_connect_device_name generated by xcb-proto 1.6 is broken.
It only works when the length of the driver name is a multiple of 4."
This reverts commit 1f9a0a4e6e.
This caused trouble with Lightsmark w/ i965 driver and fbo/fbo-blit-d24s8
(see bug 34894). It's probably something simple but no time to debug now.
SNB has 64k urb space, we only use piece of them.
The more urb space we alloc,
the more concurrent vs threads we can run.
push the urb space usage to the limit.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
this fixes fbo-generatemipmap-formats rgtc and s3tc in NPOT mode
with softpipe.
r600g fails to even get level 0 correct so have to look into that
a bit further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was always converting to 8-bit per channel unsigned formats,
which isn't suitable for RGTC signed formats, this special cases
those two formats and converts to floats for those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SNORM needs a bit of work in the state tracker in order for mipmap
generation to work I believe.
I'm also not sure that having unorm fetches for an snorm format is
sane.
With signed types we weren't hitting this test however the comment
stating this doesn't happen often doesn't apply when using signed
types since an all 0 block is quite common which isn't abs min or max.
this fixes the limits correctly again also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
if the values are all in the last dword, the high bits can be 0,
This fixes a valgrind warning I saw when playing with mipmaps.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drivers can call this function as needed. It tells the VBO module to
always unmap the current glBegin/glEnd VBO when we flush. Otherwise
it's possible to be in a flushed state but still have the VBO mapped.
Among other benefits, parallel makes now work. Since many people have
parallel builds by default (via MAKEFLAGS environment variable), this
sames some irritation at release time...when there's usually not any
other irritation already.
In my last commit I introduced a build dependency upon a new libdrm.
Add the associated autoconf checks. As the headers are part of the core
libdrm, we need to bump that version and so may as well bump the chipset
specific versions simultaneously.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With relaxed relocation checking in the kernel, we can specify a
negative delta (i.e. pointing outside of the target bo) in order to fake
a range in a large buffer. We only then need to upload the elements used
and adjust the buffer offset such that they correspond with the indices
used in the DrawArrays.
(Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and take advantage of start_vertex_bias to trim to [min_index,
max_index] where possible (i.e. when we need to upload all arrays).
Fixes half_float_vertex(misc.fillmode.wireframe)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When doing copy swapbuffers using drm, throttle on outstanding copy operations.
Introduces a new environment variable, EGL_THROTTLE_FENCES that the
user can use to indicate the desired number of outstanding swapbuffers, or
disable throttling using EGL_THROTTLE_FENCES=0.
This can and perhaps should be extended to the pageflip case as well, since
with some hardware pageflips can be pipelined. In case the pageflip syncs, the
throttle operation will be a no-op anyway.
Update copyright notices.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Use the pageflip ioctl when available.
Otherwise, or when the backbuffer contents need to be preserved,
fall back to a copy operation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The copy swap can be used when we need to preserve the contents of
the back buffer or when there is no way to do native page-flipping.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This makes it usable also for native helpers.
Also add inline functions to access the context and to
uninit the native display structure.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is reliant on a drm patch that I posted on the list + a version bump.
These will appear in drm-next today.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the drm minor version is > 9 (i.e. whats in drm-next),
we enable s3tc + texture tiling by default now.
this changes R600_FORCE_TILING to R600_TILING which can
be set to false to disable tiling on working drm.
Signed-off-by: Dave Airlie <airlied@redhat.com>
nv50_resource is being called nv04_resource now temporarily, to avoid
a naming conflict with nouveau_resource from libdrm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Modified from original to remove chipset-specific code, and to be decoupled
from the mm present in said drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
By default the hardware rounds texcoords. However,
for point sampled textures, the expected behavior is
to truncate. When we have point sampled textures,
set the truncate bit in the sampler.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=25871
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
No idea why I didn't do it like this the first time, but share
the code like other portions of mesa do using _tmp.h suffix
and some #defines for the types.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The other draw stages like aaline and pstipple were already doing this.
If the driver used the aapoint stage but not the others it would crash
because of a null pipe->draw pointer.
It was only implemented in the swrast driver and probably not used by
any applications. A modern app would use a dependent/chained texture
lookup in the fragment shader.
when doing glCopyTex[Sub]Image() and checking the source buffer's
completeness.
We only need to determine FBO completeness when the status is indeterminate.
I removed the HiZ memory management, because the HiZ RAM is too small
and I also did it in hope that HiZ will be enabled more often.
This also sets aligned strides to HIZ_PITCH and ZMASK_PITCH.
This adds support for the RGTC unsigned and signed
texture storage and fetch methods.
the code is a port of the DXT5 alpha compression code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With an extremely dumb strategy. But it's the same i915c employs.
Also improve the hw_atom code slightly by statically specifying the
required batch space. For extremely variably stuff (shaders, constants)
it would probably be better to add a new parameter to the hw_atom->validate
function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also contains the first few bits for hw state atoms.
v2: Implement suggestion by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Add the batch bo to the libdrm validation lost, for otherwise
libdrm won't take previously used buffers into account.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move it to i915_state_static.c This way i915_emit_state.c only emits
state and doesn't (re)calculate it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This came out of discussion at the office today, and we agreed that
solving this for indirect wasn't really interesting, though the
server-side change would be of a similar level of difficulty.
If another thread bound a context to the drawable then unbound it, the
driContextPriv would end up NULL.
With the previous two fixes, this fixes glx-multithread-makecurrent-2,
despite the issue not being about the multithreaded makecurrent.
The driver only has one reasonable place to look for its context to
flush anything, which is the current context. Don't bother it with
having to check.
This extension allows a client to bind one context in multiple threads
simultaneously. It is then up to the client to manage synchronization of
access to the GL, just as normal multithreaded GL from multiple contexts
requires synchronization management to shared objects.
The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're
doing a non-bias texture lookup. It has the same value as the new constant
BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no
functional changes.
There is an issue with gcc 4.6.0 that leads to segfault/assert with mesa
due to ureg_src size, reshuffling the structure member to better better
alignment work around the issue.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47893
7.9 + 7.10 candidate
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
so far only hw mipmap generation is testing on softpipe,
passes test added to piglit.
this requires another patch to mesa to let array textures mipmaps
even start to happen.
platform.system in SCons on Cygwin includes the OS version number.
Windows XP - CYGWIN_NT-5.1
Windows Vista - CYGWIN_NT-6.0
Windows 7 - CYGWIN_NT-6.1
Reduce all Cygwin platform variants to just 'cygwin' so anything
downstream can simply use 'cygwin' instead of the different full
platform names.
255.875 matches the hardware documentation. Presumably this was a typo.
NOTE: This is a candidate for the 7.10 branch, along with
commit 2bfc23fb86.
Reviewed-by: Eric Anholt <eric@anholt.net>
In the case where glBlitFramebuffer is being used to copy to a texture
without scaling it is faster if we can use the hardware to do a blit
rather than having to do a texture render. In most of the drivers
glCopyTexSubImage2D will use a blit so this patch makes it check for
when glBlitFramebuffer is doing a simple copy and then divert to
glCopyTexSubImage2D.
This was originally proposed as an extension to the common meta-ops.
However, it was rejected as using the BLT is only advantageous for Intel
hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Might be necessary if a block sneaks in somewhere, like a common
block for moves of phi sources after a loop break.
This is harmless and normally will be removed before emission.
In linear scan we can't allocate multiple values with different
live ranges at the same time to assign them consecutive regs.
Maybe we should just switch to graph coloring for all values ...
The svga_update_state() mechanism is inadequate as it will always end up
flushing the primitives before processing the SVGA_NEW_COMMAND_BUFFER
dirty state flag.
Fixes piglit/fbo-depth-sample-compare:
==14722== Invalid free() / delete / delete[]
==14722== at 0x4C240FD: free (vg_replace_malloc.c:366)
==14722== by 0x84FBBFD: intel_upload_unmap (intel_buffer_objects.c:695)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== Address 0xc606310 is 0 bytes after a block of size 18,720 alloc'd
==14722== at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==14722== by 0x85202AB: copy_array_to_vbo_array (brw_draw_upload.c:256)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34604
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This adds EXT_texture_array support to r600g, it passes the piglit
array-texture test but I suspect may not be complete.
It currently requires a kernel patch to fix the CS checker to allow
these, so you need to use R600_ARRAY_TEXTURE=true for now
to enable them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is because the HW doesn't always store a 1D array like a
2D texture, it more likely stores it like 2D texture (i.e.
alignments etc).
This means we upload each slice separately and let the driver
work out where to put it.
this might break nvc0 as I can't test it, I have only nv50 here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
... and prefers a small batch whereas gen4+ prefer a large batch to
carry more state.
Tuning using openarena/padman indicate that a batch size of just 4096 is
best for those cases.
Bugzilla: https://bugs.freedesktop.org/process_bug.cgi
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that the offsets in the vertex-fetch instructions need to be byte-aligned and makes no specification with regard to the required alignment of the offset and stride in the vertex resource constant register.
However, testing indicates that all three values need to be DWORD aligned.
Ptr can be very well NULL, so when there are two arrays, with one having
offset 0 (and thus NULL Ptr), and the other having a non-zero offset,
the non-zero value is taken as minimum (because of !low_addr ? start ...).
On 32-bit systems, this somehow works. On 64-bit systems, it leads to crashes.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
255.875 matches the hardware documentation. Presumably this was a typo.
Found by inspection. Not known to fix any issues.
Reviewed-by: Eric Anholt <eric@anholt.net>
pixel_w is the final result; wpos_w is used on gen4 to compute it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.
Fixes several OpenGL ES2 conformance failures on Sandybridge.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_PointSize (VERT_RESULT_PSIZ) doesn't take up a message register,
as it's part of the header. Without this fix, writing to gl_PointSize
would cause the SF to read and use the wrong attributes, leading to all
kinds of random looking failure.
Reviewed-by: Eric Anholt <eric@anholt.net>
If two buffers had the same stride where one buffer is a user one and
the other is a vbo, it was considered to be one interleaved buffer,
resulting in incorrect rendering and crashes.
This patch makes sure that the interleaved buffer is either user or vbo,
not both.
Needs to track this ourself since because we get into a race condition with
the dri_util.c code on make current when rendering to the front buffer.
This is what happens:
Old context is rendering to the front buffer.
App calls MakeCurrent with a new context. dri_util.c sets
drawable->driContextPriv to the new context and then calls the driver make
current. st/dri make current flushes the old context, which calls back into
st/dri via the flush frontbuffer hook. st/dri calls dri loader flush
frontbuffer, which calls invalidate buffer on the drawable into st/dri.
This is where things gets wrong. st/dri grabs the context from the dri
drawable (which now points to the new context) and calls invalidate
framebuffer to the new context which has not yet set the new drawable as its
framebuffers since we have not called make current yet, it asserts.
When clearing a GL_LUMINANCE_ALPHA buffer, for example, we need to convert
the clear color (R,G,B,A) to (R,R,R,A). We were doing this for texture border
colors but not renderbuffers. Move the translation function to st_format.c
and share it.
This fixes the piglit fbo-clear-formats test.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If finalizing a non-POW mipmapped texture with an odd-sized base texture
image we were allocating the wrong size of gallium texture (off by one).
Need to be more careful about computing the base texture image size.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=34463
Reducing the number of relocations has lots of nice knock-on effects,
not least including reducing batch buffer size, auxilliary array sizes
(vmalloced and copied into the kernel), processing of uncached
relocations etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using a temporary buffer for large discontiguous uploads into the common
buffer and a single buffered upload is faster than performing the
discontiguous copies through a mapping into the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Upload the non-vbo arrays into a single interleaved buffer object, and
so need to just emit a single vertex buffer relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the user passed in several arrays interleaved in the same vbo, only
emit a single vertex buffer and relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for a greater change, use the color_calc_state_bo already
provisioned for this purpose.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing a blit to completely overwrite a busy bo, simply
discard it and create a new one with the fresh data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic draw buffers are used by clients for temporary arrays and for
uploading normal vertex arrays. By keeping the data in memory, we can
avoid reusing active buffer objects and reallocate them as they are
changed. This is important for Sandybridge which can not issue blits
within a batch and so ends up flushing the batch upon every update, that
is each batch only contains a single draw operation (if using dynamic
arrays or regular arrays from system memory).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fix a typo which meant that --enable-shared-glapi didn't actually cause a shared glapi to be built
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This behavior was last when moving the transfers to the contexts.
This fixes several piglit failures, which were reading the color renderbuffer
before the draw operations were emitted.
This is a temporary workaround. It fixes sauerbrauten with shaders enabled.
I guess we might be changing vertex attribs somewhere and not updating
the appropriate dirty flags, therefore we can't rely on them for now.
Or maybe we need to make this state dependent on some other flags too.
More info:
https://bugs.freedesktop.org/show_bug.cgi?id=34378
When we drop the in_swtnl_draw flag, we must force a rerun of
update_need_swtnl to reset the need_swtnl flag to its correct value outside
of a swtnl vbo draw.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
If we hit the pipe_get/put_tile() path for setting up the glCopyPixels
texture we were passing the wrong x/y position to pipe_get_tile().
The x/y position was already accounted for in the pipe_get_transfer()
call so we were effectively reading from 2*readX, 2*readY.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Addresses excessive TEMP allocation in vertex shaders where all CONSTs are
stored into TEMPS at the start, but copy propagation was failing due to
the presence of IFs.
We could do something about loops, but ifs are easy enough.
This enables the egl_dri2 driver to load swrast driver
for software rendering. It could be used when hardware
dri2 drivers are not available, such as in VM.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Some cards don't appear to work correctly with the UNORM type,
so switch to the integer type, however since gallium has no
integer types yet from what I can see we need to do a hack to
workaround it for the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a workaround for a bug in libtxc_dxtn.
Fixes:
- piglit/GL_EXT_texture_compression_s3tc/fbo-generatemipmap-formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
Arrays are zero based. If the highest element accessed is 6, the
array needs to have 7 elements.
Fixes piglit test glsl-fs-implicit-array-size-03 and bugzilla #34198.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Fixes regression: https://bugs.freedesktop.org/show_bug.cgi?id=34160
Commit e7c1f058d1 disabled constant-folding
when division-by-zero occured. This was a mistake, because the spec does
allow division by zero. (From section 5.9 of the GLSL 1.20 spec: Dividing
by zero does not cause an exception but does result in an unspecified
value.)
For floating-point division, the original pre-e7c1f05 behavior is
reinstated.
For integer division, constant-fold 1/0 to 0.
This reverts commit b3cf92aa91.
The reverted commit prevented constant-folding of reciprocal expressions
when the reciprocated expression was 0. However, since the spec allows
division by zero, constant-folding *is* permissible in this case.
From Section 5.9 of the GLSL 1.20 spec:
Dividing by zero does not cause an exception but does result in an
unspecified value.
Before populating the vertex buffer attribute pointer (VB->AttribPtr[]),
convert vertex data in GL_FIXED format to GL_FLOAT.
Fixes bug: http://bugs.freedesktop.org/show_bug.cgi?id=34047
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
On r600, s3tc formats require a 1D tiled texture format,
so we have to do uploads using a blit, via the 64-bit and 128-bit formats
Based on the r600c code we use a 64 and 128-bit type to do the
blits.
Still requires R600_ENABLE_S3TC until the kernel fixes are in,
this has only been tested on evergreen where the kernel doesn't
yet get in the way.
the miptree setup and pitch storing didn't work so well for block
based things like compressed textures. The CB takes blocks, where
the texture sampler takes pixels, and transfers need bytes,
So now we store blocks/bytes and translate to pixels in the sampler.
This is necessary for s3tc to work properly.
If the underlying transfer had a stride wider for hw alignment reasons,
the mipmap generation would generate badly strided images.
this fixes a few problems I found while testing r600g with s3tc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
If EXT_draw_buffers2 is supported but ARB_draw_buffers_blend isn't
_mesa_BlendFuncSeparateEXT only sets up the blend equation and function for the
first render target. This patch makes sure that update_blend doesn't try to use
the data from other rendertargets in such cases.
Signed-off-by: Brian Paul <brianp@vmware.com>
The vertex arrays state should be set only when (_NEW_ARRAY | _NEW_PROGRAM)
is dirty. This assumes user buffer content is mutable, which will be
sorted out in the next commit. The following usage case should be much faster
now:
for (i = 0; i < 1000; i++) {
glDrawElements(...);
}
Or even:
for (i = 0; i < 1000; i++) {
glSomeStateChangeOtherThanArraysOrProgram(...);
glDrawElements(...);
}
The performance increase from this may be significant in some apps and
negligible in others. It is especially noticable in the Torcs game (r300g):
Before: 15.4 fps
After: 20 fps
Also less looping over attribs in st_draw_vbo yields slight speed-up
in apps with lots of glDraw* calls.
but really wtf? all these PCI IDs need to be ripped out of here, its totally
unscalable and the drivers already have this info so could export it some better way.
tested by Darxus on #wayland.
This an adds --enable-shared-dricore option to configure. When enabled,
DRI modules will link against a shared copy of the common mesa routines
rather than statically linking these.
This saves about 30MB on disc with a full complement of classic DRI
drivers.
v2: Only enable with a gcc-compatible compiler that handles rpath
Handle DRI_CFLAGS without filter-out magic
Build shared libraries with the full mklib voodoo
Fix typos
v3: Resolve conflicts with talloc removal patches
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Track variables, functions, and types during parsing. Use this
information in the lexer to return the currect "type" for identifiers.
Change the handling of structure constructors. They will now show up
in the AST as constructors (instead of plain function calls).
Fixes piglit tests constructor-18.vert, constructor-19.vert, and
constructor-20.vert. Also fixes bugzilla #29926.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This requires lexical disambiguation between variable and type
identifiers (as most C compilers do).
Signed-off-by: Keith Packard <keithp@keithp.com>
NOTE: This is a candidate for the 7.9 and 7.10 branches.
add support for the 32-bit types, also fixup the
export setting to handle types with channels > 11 bits properly
Signed-off-by: Dave Airlie <airlied@redhat.com>
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
Previously the SNE and SEQ instructions would calculate the partial
result to the destination register. This would cause problems if the
destination register was also one of the source registers.
Fixes piglit tests glsl-fs-any, glsl-fs-struct-equal,
glsl-fs-struct-notequal, glsl-fs-vec4-operator-equal,
glsl-fs-vec4-operator-notequal.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If the formats don't match we need to update the surface with the new
format.
if we can render to SRGB formats, enable the extension
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a regression from commit 5cbff0932e.
The problem is *some* glDrawPixels fragment programs need to be deleted,
but not all. Use an explicit flag to indicate whether or not the program
needs to be deleted.
This should fix http://bugs.freedesktop.org/show_bug.cgi?id=34049
From section 5.9 of the GLSL 1.20 spec:
The operator modulus (%) is reserved for future use.
From section 5.8 of the GLSL 1.20 spec:
The assignments modulus into (%=), left shift by (<<=), right shift by
(>>=), inclusive or into ( |=), and exclusive or into ( ^=). These
operators are reserved for future use.
The GLSL ES 1.00 spec and GLSL 1.10 spec have similiar language.
Fixes bug:
https://bugs.freedesktop.org//show_bug.cgi?id=33916
Fixes Piglit tests:
spec/glsl-1.00/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.00/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.10/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.10/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.20/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.20/compiler/assignment-operators/modulus-assign-00.frag
Fixes a minor memory leak with the "engine" mesa demo.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure we unreference the vertex buffer pointers in a local array.
This fixes huge vertex buffer / memory leaks in mesa demos "fire" and "engine".
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Relative addressing of constant buffers can't work properly through the
kcache, since you can only address within the currently locked kcache window.
Instead, this patch binds the constant buffer as a shader resource, and then
explicitly fetches the constant using a vertex fetch with fetch type
VTX_FETCH_NO_INDEX_OFFSET from the shader. There's probably still some room
for improvement, doing the fetch right before the instruction that needs the
value may not be quite optimal for example.
The r600_bc_alu_src structure is used in two different ways, as a vector and
for the individual channels of that same vector. This is somewhat fragile,
and probably confusing.
If the client hasn't done the initial wl_display_iterate() at the time
we initialize the display, we have to do that in platform_wayland.c.
Make sure we detect that correctly instead of dup()ing fd=0, and use
the sync callback to make sure we don't wait forever for authorization that
won't happen.
This code has originally matured in r300g and was ported to r600g several
times. It was obvious it's a code duplication.
See also comments in the header file.
The scheduler and the register allocator are not good enough yet to deal
with the effects of the register rename pass. This was causing a 50%
performance drop in Lightsmark. The pass can be re-enabled once the
scheduler and the register allocator are more mature. r300 and r400
still need this pass, because it prevents a lot of shaders from using
too many texture indirections.
NOTE: This is a candidate for the 7.10 branch.
This adds i965 support for GL_EXT_framebuffer_sRGB, it introduces a new
constant to say that the driver can support sRGB enabled FBOs since enabling
the extension doesn't mean the driver can actually support sRGB.
Also adds the suggested state flush in the core code suggested by Brian.
fix the ARB_fbo color encoding.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Querying index zero is not an error in OpenGL ES 2.0.
Querying an index larger than the value returned by
GL_MAX_VERTEX_ATTRIBS is an error in all APIs.
Fixes bugzilla #32375.
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes. It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway. However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
The type of u_get_transfer_vtbl of the usage argument in u_transfer.h is
unsigned and not enum pipe_transfer_usage. This patch changes the type
of usage to unsigned to match the prototype in the header file.
Standard library functions in C++ are in the std namespace. When using
C++-style header files for the standard library, some compilers, such as
Sun Studio, provide symbols only for the std namespace and not for the
global namespace.
This patch adds using statements for standard library functions. Another
option could have been to prepend standard library function calls with
'std::'.
This patch fixes several compilation errors with Sun Studio.
This just adds a flag to create the texture without doing any
flushing to it. Flushing occurs in the draw function. This avoids
unnecessary flushes when we end up rebinding a CB/DB/texture due
to the blitter just restoring state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, the set_sampler_views() and restore_sampler_views() functions
used MAX2(old,new) to tell the driver how many samplers or sampler
views to set. This could result in cases such as:
pipe->set_fragment_sampler_views(pipe, 4, views={foo, bar, NULL, NULL})
Many/most gallium drivers would take this as-is and set
ctx->num_sampler_views=4 and ctx->sampler_views={foo, bar, NULL, NULL, ...}.
Later, loops over ctx->num_sampler_views would have to check for null
pointers. Worse, the number of sampler views and number of sampler CSOs
could get out of sync:
ctx->num_samplers = 2
ctx->samplers = {foo, bar, ...}
ctx->num_sampler_views = 4
ctx->sampler_views={Foo, Bar, NULL, NULL, ...}
So loops over the num_samplers could run into null sampler_views pointers
or vice versa.
This fixes a failed assertion in the SVGA driver when running the Mesa
engine demo in AA line mode (and possibly other cases).
It looks like all gallium drivers are careful to unreference views
and null-out sampler CSO pointers for the units beyond what's set
with the pipe::bind_x_sampler_states() and pipe::set_x_sampler_views()
functions.
I'll update the gallium docs to explain this as well.
This new interface could set up context for OpenGL,
OpenGL ES1 and OpenGL ES2. It will be used by egl_dri2
driver.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Leak was introduced when fixing strict aliasing violation in this code:
the reference counting was preserved, but the destructor call on zero
reference count was not.
Exactly one half would be the ideal, but this is a soft limit, and one
more byte over brings us to synchronous behavior.
Flushing when the referred GMR exceeds one third of the aperture gives us
statistically better performance.
Certain mingw32 cross compilers (e.g. RedHat's) defaults to use DLL gcc
runtime.
Given the main deliverable from this project are self-contained drivers,
which are loaded by any application, this dependency can cause havoc.
If we get a sw accessible buffer like the S8 texture we end up
doing depth tracking on it when there is no need since we won't
ever bind it to the hardware. This leads to a sw fallback in the
transfer destruction which leads to and endless recusion loop
of fail in transfer destroy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a flag to keep track of whether the depth texture structure
is the flushed texture or not, so we can avoid doing flushes when
we do a hw rendering from one to the other.
it also renames flushed to dirty_db which tracks if the DB copy
has been dirtied by being bound to the hw.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This consolidates the code duplicated between the fragment sampler
and vertex sampler functions. Plus, it'll make adding support for
geometry shader samplers trivial.
Before we were looping to nr_samplers, which is the number of fragment
samplers, not vertex samplers.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
For example, this now raises an error:
#define XXX 1 / 0
Fixes bug: https://bugs.freedesktop.org//show_bug.cgi?id=33507
Fixes Piglit test: spec/glsl-1.10/preprocessor/modulus-by-zero.vert
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Avoid division-by-zero when constant-folding the following expression
types:
ir_unop_rsq
ir_binop_div
ir_binop_mod
Fixes bugs:
https://bugs.freedesktop.org//show_bug.cgi?id=33306https://bugs.freedesktop.org//show_bug.cgi?id=33508
Fixes Piglit tests:
glslparsertest/glsl2/div-by-zero-01.frag
glslparsertest/glsl2/div-by-zero-02.frag
glslparsertest/glsl2/div-by-zero-03.frag
glslparsertest/glsl2/modulus-zero-01.frag
glslparsertest/glsl2/modulus-zero-02.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Do not constant-fold a reciprocal if any component of the reciprocated
expression is 0. For example, do not constant-fold `1 / vec4(0, 1, 2, 3)`.
Incorrect, previous behavior
----------------------------
Reciprocals were constant-folded even when some component of the
reciprocated expression was 0. The incorrectly applied arithmetic was:
1 / 0 := 0
For example,
1 / vec4(0, 1, 2, 3) = vec4(0, 1, 1/2, 1/3)
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This was my mistake when converting from talloc to ralloc. I was
confused because the other calls in the function are to asprintf_append
and the original code used str as the context rather than NULL.
Fixes bug #33823.
Previously a register would be marked as available if any component
was written. This caused shaders such as this:
0: TEX TEMP[0].xyz, INPUT[14].xyyy, texture[0], 2D;
1: MUL TEMP[1], UNIFORM[0], TEMP[0].xxxx;
2: MAD TEMP[2], UNIFORM[1], TEMP[0].yyyy, TEMP[1];
3: MAD TEMP[1], UNIFORM[2], TEMP[0].zzzz, TEMP[2];
4: ADD TEMP[0].xyz, TEMP[1].xyzx, UNIFORM[3].xyzx;
5: TEX TEMP[1].w, INPUT[14].xyyy, texture[0], 2D;
6: MOV TEMP[0].w, TEMP[1].wwww;
7: MOV OUTPUT[2], TEMP[0];
8: END
to produce incorrect code such as this:
BEGIN
DCL S[0]
DCL T_TEX0
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[0].xyz = MOV U[0]
R[1] = MUL CONST[0], R[0].xxxx
R[2] = MAD CONST[1], R[0].yyyy, R[1]
R[1] = MAD CONST[2], R[0].zzzz, R[2]
R[0].xyz = ADD R[1].xyzx, CONST[3].xyzx
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[1].w = MOV U[0]
R[0].w = MOV R[1].wwww
oC = MOV R[0]
END
Note that T_TEX0 is copied to R[0], but the xyz components of R[0] are
still expected to hold a calculated value.
Fixes piglit tests draw-elements-vs-inputs, fp-kill, and
glsl-fs-color-matrix. It also fixes Meego bugzilla #13005.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Also return it as the correct type. Previously the whole array would
be returned and each element would be expanded to a vec4.
Fixes piglit test getuniform-01 and bugzilla #29823.
Passing ralloc_vasprintf_append a 0-byte allocation doesn't work. If
passed a non-NULL argument, ralloc calls strlen to find the end of the
string. Since there's no terminating '\0', it runs off the end.
Fixes a crash introduced in 14880a510a.
If we see a MACRO bit on r600g its 2D tiled,
if don't see a MACRO bit and we do see a MICRO bit then its 1D tiled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the tile type for the buffer is 1 then its been bound to the
DB at some point, we need to decompress it, otherwise its only
been bound as texture/cb so don't do anything.
This fixes 5 piglit tests here on r600g.
this just adds the ioctl interface and sets the tile type
and array mode in the correct place.
This seems to bring eg 1D tiling to the same level, and issues
as on r600. No idea how to address 2D yet.
Previously we'd happily compile GLSL 1.30 shaders on any driver. We'd
also happily compile GLSL 1.10 and 1.20 shaders in an ES2 context.
This has been a long standing FINISHME in the compiler.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
This annotation is for an "in" function parameter for which it is only legal
to pass constant expressions. The only known example of this, currently,
is the textureOffset functions.
This should never be used for globals.
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
Since the introduction of ir_var_system_value, system variables would be
printed as "temporary" and temporaries would result in out-of-bounds
array access, showing up as garbage in printed IR.
SVGA3D only supports SGN for vertex shaders, and this requires two additional
temporary registers for intermediate results.
For fragment shaders, lower to two CMPs and one ADD.
The extra length is the size of the request *minus* the size of the
VendorPrivate header, not the addition.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
xGLXChangeDrawableAttributesSGIXReq follows the GLXVendorPrivate header
with a drawable, number of attributes, and list of (type, value)
attribute pairs. Don't forget to put the number of attributes in there.
I don't think this can ever have worked.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Like on some r5xx, there are multiple DB backends on the r600,
we need to add up the query results from each of these to get the
final correct value.
So far I'm not 100% sure how to calculate the num_db, value
setting it to 4 should be harmless enough until we do.
This fixes occulsion_query piglit test on my rv740.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although CUBE is a reduction inst, it writes to more than just PV.X
so we need to keep the dst channel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This only works on r600/r700 so far, evergreen doesn't appear
to have the multiwrite enable bit in the color control, so we
may have to actually do a shader rewrite on EG hardware.
remove some duplicate code reg defines also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I know Jerome will probably rewrite the way depth textures work sometime
soon. For the time being this should at least make common depth texture usage
for shadowing work properly though.
When the paint is opaque (currently, solid color with alpha 1.0f), no
blending is needed for VG_BLEND_SRC_OVER. This eliminates the serious
performance hit introduced by 859106f196
for a common scenario.
Swizzles are now defined everywhere as a field with 12 bits that contains
4 channels worth of meaningful information. Any channel that is unused is
set to RC_SWIZZLE_UNUSED. This change is necessary because rgb instructions
and alpha instructions were initializing channels that would never be used
(channel 3 for rgb and channels 1-3 for alpha) with 0 (aka RC_SWIZZLE_X).
This made it impossible to use generic helper functions for swizzles,
because sometimes a channel value of 0 meant unused and other times it
meant RC_SWIZZLE_X.
All hacks that tried to guess how many channels were relevant have
also been removed.
1) Only translate the [min_index, max_index] range.
2) Upload translated vertices via the uploader.
3) Rename valid_vertex_buffer[] to real_vertex_buffer[]
Only upload the [min_index, max_index] range instead of [0, userbuf_size].
This an important optimization.
Framerate in Lightsmark:
Before: 22 fps
After: 75 fps
The same optimization is already in r300g.
I can't see a performance difference with this code, which means all
the driver-specific code removed in this commit was unnecessary.
Now we use u_upload_mgr in a slightly different way than we did before it got
dropped. I am not restoring the original code "as is" due to latest
u_upload_mgr changes that r300g performance benefits from.
This also fixes:
- piglit/fp-kil
When an app loads libEGL.so dynamically with RTLD_LOCAL, loading DRI
drivers would fail because of missing glapi symbols. This commit makes
egl_dri2 load libglapi.so with RTLD_GLOBAL to export glapi symbols for
future symbol resolutions.
The same trick can be found in GLX. However, egl_dri2 can only do so
when --enable-shared-glapi is given. Because, otherwise, both libGL.so
and libglapi.so define glapi symbols and egl_dri2 cannot tell which
library to load.
When the user sets EGL_DRIVER to egl_dri2 (or egl_glx), make sure the
built-in driver is used. The user might leave the outdated egl_dri2.so
(or egl_glx.so) on the filesystem and we do not want to load it.
makedepend would crash when a source includes a header indirectly, such
as
#define HEADER "some-header.h"
#include HEADER
Do not define HEADER (makedepend would detects this as an incomplete
include) and add the dependency manually in the Makefile.
This should hopefully fix bug #33374.
For 1D/2D texture arrays use the pipe_resource::array_size field.
In OpenGL 1D arrays texture use the height dimension as the array
size and 2D array textures use the depth dimension as the array size.
Gallium uses a special array_size field instead. When setting up
gallium textures or comparing Mesa textures to gallium textures we
need to be extra careful that we're comparing the right fields.
The new st_gl_texture_dims_to_pipe_dims() function maps OpenGL
texture dimensions to gallium texture dimensions and simplifies
this quite a bit.
This reverts commit d3df641f0a.
The original commit had sat unpushed on my machine for months. By the
time I found it again, I had forgotten that we had decided not to use
this change after all, (the relevant test was removed long ago).
The GLSL specification is vague here, (just says "as is standard for
C++"), though the C specifications seem quite clear that this should
be an error.
However, an existing piglit test (CorrectPreprocess11.frag) expects
this to be a warning, not an error, so we change this, and document in
README the deviation from the specification.
So that 'foo' can be found in: OPTION=prefixfoosuffix,foo
Also allow that debug options can be separated by a non-alphanumeric characters
instead of just commas.
This drops the memblock manager for ZMASK. Instead, only one zbuffer can be
compressed at a time. Note that this does not necessarily have to be slower.
When there is a large number of zbuffers, compression might be used more often
than it was before. It's also easier to debug.
How it works:
1) 'clear' turns the compression on.
2) If some other zbuffer is set or the currently-bound zbuffer is used
for texturing, the driver decompresses it and then turns the compression off.
Notes:
- The ZMASK clear has been refactored, so that only one packet3 is used to clear
ZMASK.
- The 8x8 compression mode is disabled. I couldn't make it work without issues.
- Also removed driver-specific stuff from u_blitter.
Driver status:
- RV530 and R580 appear to just work (finally).
- RV570 should work, but there may be an issue that we don't correctly
calculate the number of dwords to clear, resulting in a partially
uninitialized zbuffer.
- RS690 misrenders as if no ZMASK clear happened. No idea what's going on.
- RV350 may even hardlock. This issue was already present and this patch doesn't
fix it.
I think we are still missing some hardware info we need to make the zbuffer
compression work fully.
Note that there is also an issue with HiZ, resulting in a sort of blocky
zigzagged corruption around some objects.
From the AMD_conservative_depth spec:
If gl_FragDepth is redeclared in any fragment shader in a program, it
must be redeclared in all fragment shaders in that program that have
static assignments to gl_FragDepth. All redeclarations of gl_FragDepth in
all fragment shaders in a single program must have the same set of
qualifiers.
When AMD_conservative_depth is enabled:
* Let 'layout' be a token.
* Extend the production rule of layout_qualifier_id to process the tokens:
depth_any
depth_greater
depth_less
depth_unchanged
Let's assume there are two options with names such that one is a substring
of another. Previously, if we only specified the longer one as a debug option,
the shorter one would be considered specified as well (because of strstr).
This commit fixes it by checking that each option is surrounded by commas.
(a regexp would be nicer, but this is not a performance critical code)
Update the max_array_access of a global as functions that use that
global are pulled into the linked shader.
Fixes piglit test glsl-fs-implicit-array-size-01 and bugzilla #33219.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Previously only global arrays with implicit sizes would be patched.
This causes all arrays that are actually accessed to be sized.
Fixes piglit test glsl-fs-implicit-array-size-02.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
emit_adjusted_wpos() needs separate x,y translation values. If we
invert Y, we don't want to effect X.
Part of the fix for http://bugs.freedesktop.org/show_bug.cgi?id=26795
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The set of internalFormat parameters accepted by glRenderBufferStorage
depends on the EXT vs. ARB version of framebuffer_object. The later
added support for GL_ALPHA, GL_LUMINANCE, etc. formats. Note that
these formats might be legal but might not be supported. That should
be checked with glCheckFramebufferStatus().
This reverts commit 65c41d55a0.
There really are quite a few differences in the set of internal
formats allowed by glTexImage and glRenderbufferStorage.
Per the spec, all OpenVG handles are 32-bit. We can't just cast them
to/from integers on 64-bit systems.
Start fixing that mess by introducing a set of handle/pointer conversion
functions in handle.h. The next step is to implement a handle/pointer
hash table...
We were sending too long requests for GLXChangeDrawableAttributes,
GLXGetDrawableAttributes, GLXDestroyPixmap and GLXDestroyWindow.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
FLT_TO_INT is a vector instruction, despite what the (current) documentation
says. FLT_TO_INT_FLOOR and FLT_TO_INT_RPI aren't explicitly mentioned in the
documentation, but those are vector instructions too.
largely a merge of the previously discussed origin/gallium-resource-sampling
but updated.
the idea is to allow arbitrary binding of resources, the way opencl, new gl
versions and dx10+ require, i.e.
DCL RES[0], 2D, FLOAT
LOAD DST[0], SRC[0], RES[0]
SAMPLE DST[0], SRC[0], RES[0], SAMP[0]
Contrary what the name may suggest, LLVM's opaque types are used for
recursive types -- types whose definition refers itself -- so opaque
types correspond to pre-declaring a structure in C. E.g.:
struct node;
struct link {
....
struct node *next;
};
struct node {
struct link link;
}
Void pointers are also disallowed by LLVM. So the suggested way of creating
what's commonly referred as "opaque pointers" is using byte pointer (i.e.,
uint8_t * ).
Fixes the build when selecting driver=osmesa and building static libraries.
Otherwise, mklib tries to add the ‘-ltalloc’ object to the archive, which
obviously fails.
Clients which statically link to osmesa will need to link to libtalloc also,
as specified in the Libs.private of osmesa.pc.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=33360
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
I couldn't make it work.
GB_TILE_CONFIG.Z_EXTENDED, which enables per-pixel Z clamping, and
VAP_CLIP_CNTL.CLIP_DISABLE, which disables clipping, do help, but they
also add regressions like random graphics corruptions in some games.
But we have to cheat and peek at the GENERIC semantic indices the
state tracker uses for TEXn.
Only outputs from 0x300 to 0x37c can be replaced, and so we have to
know on shader compilation which ones to put there in order to keep
doing separate shader objects properly.
At some point I'll probably create a patch that makes gallium not
force us to discard the information about what is a TexCoord.
The _NEW_ACCUM flag was only set when changing the accumulation
buffer clear color and never used anywhere. Reclaim that dirty bit.
Clean up the definitions of the other dirty bit flags.
Only a few texture object parameters can effect texture completeness:
min level, max level, minification filter. Don't mark the texture
incomplete for other texture object state changes.
We are not required to do the linear->sRGB conversion if ARB_framebuffer_sRGB
is unsupported. However I think the conversion should work in hw except
for blending, which matches the D3D9 behavior.
The rvalue of the returned value can be NULL if the shader says
'return foo();' and foo() is a function that returns void.
Existing GLSL specs do *NOT* say that this is an error. The type of
the return value is void. If the return type of the function is also
void, then this should compile without error. I expect that future
versions of the GLSL spec will fix this (wink, wink, nudge, nudge).
Fixes piglit test glsl-1.10/compiler/expressions/return-01.vert and
bugzilla #33308.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Before, we were converting between linear/sRGB in glReadPixels,
glDrawPixels, glAccum, etc if the renderbuffer was an sRGB texture.
Those all need to operate on pixel values as-is without conversion.
Also, when setting up render-to-texture, if the texture is sRGB the
pipe_surface view must be linear RGB. This will change when we
support GL_ARB_framebuffer_sRGB.
This fixes http://bugs.freedesktop.org/show_bug.cgi?id=33353
When we read/write image tiles we need to use the format specified
in the pipe_surface, not the pipe_transfer format (which comes from
the underlying texture/resource format).
This comes up when rendering to sRGB surfaces (via OpenGL render to
texture). Ignoring the new GL_ARB/EXT_framebuffer_sRGB extension
for now, when we render to a sRGB surface we need to treat it like
a regular, linear colorspace RGB surface. Before, when we read/wrote
tiles to sRGB surfaces we were inadvertantly doing the color space
conversion.
The new function, pipe_get_tile_rgba_format(), no longer takes a
swizzle (we weren't actually using it anywhere). Rename it to
indicate that the format is passed explicitly.
GLES can be enabled by running scons with
$ scons gles=yes
When gles=yes is given, the build is changed in three ways. First,
libmesa.a will be built with FEATURE_ES1 and FEATURE_ES2. This makes
DRI drivers and libEGL support and advertise GLES support. Second, GLES
libraries will be created. They are libGLESv1_CM, libGLESv2, and
libglapi. Last, libGL or opengl32 will link to libglapi. This change
is required as _glapi_* will be declared as __declspec(dllimport) in
libmesa.a on windows. libmesa.a expects those symbols to be defined in
another DLL. Due to this change to GL, GLES support is marked
experimental.
Note that GLES requires libxml2-python to generate some of its sources.
The original allocations use regs->regs as the context, so talloc will
happily ignore the context given here. Change it to match to clarify
that it isn't changing.
Improves the cases when:
* an explicit assignment references the read-only variable
* an 'out' or 'inout' function parameter references the read-only variable
This adds the get/enable enums and internal gl_config storage
for this extension.
In theory this is all that is needed to enable this extension
from what I can see, since its not mandatory to implement the
features if you don't advertise the visuals or the fb configs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The loop to choose a pixel format for the window was incrementing
'i' after we succeeded in creating the window so if we chose format[0]
for graw_create_window_and_screen() we were putting format[1] in
the pipe_resource template for creating the render target.
This only worked because of the order of the elements in the formats[]
array.
The graw_xlib.c code now properly compares the requested gallium pixel
format against the visual's color layout.
Update all the graw demos to fix the off-by-one-i error.
ideally this should be set once in the beginning of CS but there's
no way to change values there while in the middle of rendering.
For now reemitting SQ setup seems to work probably due to
r700WaitForIdleClean after each render
currently does not to try to decrease values once increased
fixes hangs in glsl-vs-vec4-indexing-temp-src-in-nested-loop-combined
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined for my rv740
maybe more for other chips
When --enable-shared-glapi is specified, libGL will share libglapi with
OpenGL ES instead of defining its own copy of glapi. This makes sure an
app will get only one copy of glapi in its address space.
The new option is disabled by default. When enabled, libGL and libglapi
must be built from the same source tree and distributed together. This
requirement comes from the fact that the dispatch offsets used by these
libraries are re-assigned whenever GLAPI XMLs are changed.
For GLX, indirect rendering for has_different_protocol() functions is
tricky. A has_different_protocol() function is assigned only one
dispatch offset, yet each entry point needs a different protocol opcode.
It cannot be supported by the shared glapi. The fix to this is to make
glXGetProcAddress handle such functions specially before calling
_glapi_get_proc_address.
Note that these files are automatically generated/re-generated
src/glx/indirect.c
src/glx/indirect.h
src/mapi/glapi/glapi_mapi_tmp.h
Move _glapi_* symbols from libGLESv1_CM.so and libGLESv2.so to
libglapi.so. This makes sure an app will get only one copy of glapi in
its address space.
Note that with this change, libGLES* and libglapi must be built from the
same source tree and distributed together. This requirement comes from
the fact that the dispatch offsets used by these libraries are
re-assigned whenever GLAPI XMLs are changed.
In bridge mode, mapi no longer implements glapi.h. It becomes a user of
glapi.h. Imagine an app that uses both libGL.so and libGLESv2.so.
There will be two copies of glapi in the app's memory. It is possible
that _glapi_get_dispatch does not return what _glapi_set_dispatch set,
if they access different copies of the global variables. The solution
to this situation to build either one of the libraries as a bridge to
the other. Or build both libraries as bridges to another shared
glapi library.
When MAPI_MODE_GLAPI is defined, u_current_table is renamed to
_glapi_Dispatch or _glapi_tls_Dispatch. The ASM dispatchers should not
use hardcoded name.
The new implementation is based on mapi. No new script is needed. As
noted in sources.mk, the way to use it is to compile MAPI_GLAPI_SOURCES
with MAPI_MODE_GLAPI defined.
Fix GLAPIPrinter, ES1APIPrinter, and ES2APIPrinter to output files that
are ready for compilation. Since gl_and_es_API.xml is based on
gl_API.xml, the hidden and handcode attributes of entries have to be
overridden for ES1APIPrinter and ES2APIPrinter.
gl_and_es_API.xml defines OpenGL ES 1.1 and 2.0 API as well as OpenGL
API. It consists of gl_API.xml and the newly added es_EXT.xml,
ARB_get_program_binary.xml, OES_single_precision.xml, and
OES_fixed_point.xml.
This fixes a bunch of unnecessary barriers due to the scheduler not
knowing what that arbitrary register description refers to when trying
to reason about its dependencies.
The result is rescheduling in the convolution kernel shader in
Lightsmark, which results in avoiding register spilling and increasing
the performance of the first scene from 6-7 fps midway through the
panning to 11fps. The register spilling was a regression from Mesa
7.9 to Mesa 7.10.
Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also
reschedules the giant multiply tree at the end of
glsl-fs-convolution-1 so that we end up not spilling registers,
producing the expected level of performance.
Can happen during swrast fallbacks if a buffer is somehow bound as
a render target and a texture.
Fixes gnome-shell on nv20, and gets it mostly working on nv10.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
sw clears were being used and not getting the correct offsets in the span
code.
also not emitting correct offsets for CB draws to texture levels.
(I've no idea why I'm playing with r100).
This is a candidate for 7.9 and 7.10
According to R700 ISA we have only two channels for cfile constants.
This patch makes piglit tests "glsl1-constant array with constant
indexing" happy on RV710.
Fixes the following Piglit tests:
glslparsertest/shaders/array2.frag
glslparsertest/shaders/dataType6.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a potential failure when a begin/end_query is the first
thing to happen after flushing the scene.
NOTE: This is a candidate for the 7.10 and 7.9 branches.
This revealed a bug in ra_get_spill_benefit where we only considered
the benefit of the first adjacency we were to remove, explaining some
of the ugly spilling I've seen in shaders. Because of the reduced
spilling, it reduces the runtime of glsl-fs-convolution-1 36.9% +/-
0.9% (n=5).
This was recommended in the original paper, but I figued "make it run"
before "make it fast". Now we make it fast. Reduces the runtime of
glsl-fs-convolution-1 by 12.7% +/- 0.6% (n=5).
Our use of the register allocator in i965 is somewhat unusual.
Whereas most architectures would have a smaller set of registers with
fewer register classes and reuse that across compilation, we have 1,
2, and 4-register classes (usually) and a variable number up to 128
registers per compile depending on how many setup parameters and push
constants are present. As a result, when compiling large numbers of
programs (as with glean texCombine going through ff_fragment_shader),
we spent much of our CPU time in computing the q[] array. By keeping
a separate list of what the conflicts are for a particular reg, we
reduce glean texCombine time 17.0% +/- 2.3% (n=5).
We don't expect this optimization to be useful for 915, which will
have a constant register set, but it would be useful if we were switch
to this register allocator for Mesa IR.
Fixes texrect-many regression with ff_fragment_shader -- as we added
refs to the subsequent texcoord scaling paramters, the array got
realloced to a new address while our params[] still pointed at the old
location.
The driver was saying that independend blend functions was not supported,
but it really was. The driver was using the per-target independend blend
factors but the state tracker was only setting the 0th one (per the
Gallium spec).
Fixes a piglit fbo-drawbuffers2-blend regression.
See https://bugs.freedesktop.org/show_bug.cgi?id=33215
The removed semantic check also exists in ast_type_specifier::hir(), which
is a more natural location for it.
The check verified that precision statements are applied only to types
float and int.
* Check that precision qualifiers only appear in language versions 1.00,
1.30, and later.
* Check that precision qualifiers do not apply to bools and structs.
Fixes the following Piglit tests:
* spec/glsl-1.30/precision-qualifiers/precision-bool-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-02.frag
Change default value to ast_precision_none, which denotes the absence of
a precision of a qualifier.
Previously, the default value was ast_precision_high. This made it
impossible to detect if a precision qualifier was present or not.
The check is performed only in GLSL versions >= 1.30.
From section 4.3.4 of the GLSL 1.30 spec:
"It is an error to use centroid in in a vertex shader."
Fixes Piglit test
spec/glsl-1.30/compiler/storage-qualifiers/vs-centroid-in-01.vert
The check is performed only in GLSL versions >= 1.30.
Fixes the following Piglit tests:
* spec/glsl-1.30/compiler/interpolation-qualifiers/fs-smooth-02.frag
* spec/glsl-1.30/compiler/interpolation-qualifiers/vs-smooth-01.vert
... and 'centroid varying'. The check is performed only in GLSL
versions >= 1.30.
From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
"interpolation qualifiers may only precede the qualifiers in, centroid
in, out, or centroid out in a declaration. They do not apply to the
deprecated storage qualifiers varying or centroid varying."
Fixes Piglit test
spec/glsl-1.30/compiler/interpolation-qualifiers/smooth-varying-01.frag.
If an interpolation qualifier is present, then the method returns that
qualifier's string representation. For example, if the noperspective bit
is set, then it returns "noperspective".
This implements the extension by choosing a different set of texture
fetch functions when the texture parameter changes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a literal slot isn't used it should be set
to 0 instead of an uninitialized value. Also the
channels for pre R700 trig functions were incorrect.
And most important literals were not counted against ndw,
resulting in an invalid force_add_cf detection.
We don't have all of the features of this extension hooked up yet, but
the consensus yesterday was that since those features are things that
we should also be supporting in our ES2 implementation, claiming ES2
here too doesn't make anything worse and will make incremental
improvement through piglit easier.
This code should never have been triggered, but I often did anyway
when I disabled optimization passes during debugging, then spent my
time debugging that this code doesn't work.
The comment of "this is just like teximages except for..." is a pretty
good clue that we're handling this wrong. By just using the teximage
code, we catch a bunch of cases we'd missed, like GL_RED and GL_RG.
This catches more opportunities than the prog_optimize.c code on
openarena's fixed function shaders turned to GLSL, mostly due to
looking at multiple source instructions for copy propagation
opportunities. It should also be much more CPU efficient than
prog_optimize.c's code.
The usage of macro V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_FLT_TO_INT_FLOOR was
introduced by commit 323ef3a1f0 but the
macro is undefined. Disable this case to fix the build for now.
Add a bit in struct gl_extensions for OES_standard_derivatives, and enable
the bit by default. Advertise the extension only if the bit is enabled.
Previously, OES_standard_derivatives was advertised in GLES2 contexts
if ARB_framebuffer_object was enabled.
The specs that add 'layout' require the use of 'in' or 'out'.
However, a number of implementations, including Mesa, shipped several
of these extensions allowing the use of 'varying' and 'attribute'.
For these extensions only a warning is emitted.
This differs from the behavior of Mesa 7.10. Mesa 7.10 would only
accept 'attribute' with 'layout(location)'. This behavior was clearly
wrong. Rather than carrying the broken behavior forward, we're just
doing the correct thing.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
All of the extensions that add the 'layout' keyword also enable (and
required) the use of 'in' and 'out' with shader globals.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
When use_spoken is true, istart (the first vertex of this segment) is
replaced by i0 (the spoken vertex of the fan). There are still icount
vertices.
Thanks to Brian Paul for spotting this.
Without this, X doesn't start with UMS on r300g.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Paulo Zanoni <pzanoni@mandriva.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The idea is to be able to match a driver using the following order
try egl_gallium with hw renderer
try egl_dri2
try egl_gallium with sw renderer
try egl_glx
given the module list
egl_gallium
egl_dri2
egl_glx
For that, UseFallback initialization option is added. The module list
is matched twice: with the option unset and with the option set. In the
first pass, egl_gallium skips its sw renderer and egl_glx rejects to
initialize since UseFallback is not set. In the second pass,
egl_gallium skips its hw renderer and egl_dri2 rejects to initialize
since UseFallback is set. The process stops at the first driver that
initializes the display.
Reorder/rename and document the fields that should be set by the driver during
initialization. Drop the major/minor arguments from drv->API.Initialize.
This makes it unnecessary to pass _mesa_glsl_parse_state around
everywhere, making at least the prototypes a lot easier to read.
It's also more C++-ish than a pile of static C functions.
All of these functions used to take s_list pointers so they wouldn't all
need SX_AS_LIST conversions and error checking. However, the new
pattern matcher conveniently does this for us in one centralized place.
So there's no need to insist on s_list. Switching to s_expression saves
a bit of code and is somewhat cleaner.
Previously, the IR reader was riddled with code that:
1. Checked for the right number of list elements (via a linked list walk)
2. Retrieved references to each component (via ->next->next pointers)
3. Downcasted as necessary to make sure that each sub-component was the
right type (i.e. symbol, int, list).
4. Checking that the tag (i.e. "declare") was correct.
This was all very ad-hoc and a bit ugly. Error checking had to be done
at both steps 1, 3, and 4. Most code didn't even check the tag, relying
on the caller to do so. Not all callers did.
The new pattern matching module performs the whole process in a single
straightforward function call, resulting in shorter, more readable code.
Unfortunately, MSVC does not support C99-style anonymous arrays, so the
pattern must be declared outside of the match call.
stObj->pt is null when a TFP texture is passed to st_finalize_texture,
and with the changes introduced in the above commit this resulted in a
new texture being created and the existing image being copied into it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Fixes a PT_NOT_PRESENT error cause by:
- allocating in VRAM
- emitting GART relocs to 0x17bc/0x17c0, moving the buffer
- telling the bufmgr that the buffer should be in VRAM when we use it,
but not correcting the value sent to 0x17bc/0x17c0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes a failed assertion when a renderbuffer ID that was gen'd but not
previously bound was passed to glFramebufferRenderbuffer(). Generate
the same error that NVIDIA does.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a problem when glDrawBuffers(GL_NONE). The fragment program
was writing to color output[0] but OutputsWritten was 0. That led to a
failed assertion in the Mesa->TGSI translation code.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The extension string in GLES1 contexts always advertised
GL_OES_point_sprite. Now advertisement depends on ARB_point_sprite being
enabled.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Change all OES extension strings that depend on ARB_framebuffer_object to
instead depend on EXT_framebuffer_object.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Add GL_OES_stencil8 to ES2.
Remove the following:
GL_OES_compressed_paletted_texture : ES1
GL_OES_depth32 : ES1, ES2
GL_OES_stencil1 : ES1, ES2
GL_OES_stencil4 : ES1, ES2
Mesa advertised these extensions, but did not actually support them.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Place GL, GLES1, and GLES2 extensions in a unified extension table. This
allows one to enable, disable, and query the status of GLES1 and GLES2
extensions by name.
When tested on Intel Ironlake, this patch did not alter the extension
string [as given by glGetString(GL_EXTENSIONS)] for any API.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In particular, variables cannot be redeclared invariant after being
used.
Fixes piglit test invariant-05.vert and bugzilla #29164.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
With the change to not reset baselevel, this GL_LINEAR filtering was
resulting in generating mipmaps off of the base level instead of the
next higher detail level. Fixes fbo-generatemipmap-filtering.
Reported by: Neil Roberts <neil@linux.intel.com>
The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
Update SConscripts to re-enable or add support for EGL on windows and
x11 platforms respectively. targets/egl-gdi is replaced by
targets/egl-static, where "-static" means pipe drivers and state
trackers are linked to statically by egl_gallium, and egl_gallium is a
built-in driver of libEGL. There is no more egl_gallium.dll on Windows.
This greatly improves codegen for programs with flow control by
allowing coalescing for all instructions at the top level, not just
ones that follow the last flow control in the program.
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
Fixes piglit tests glsl-1.20/compiler/qualifiers/in-01.vert and
glsl-1.20/compiler/qualifiers/out-01.vert and bugzilla #32910.
NOTE: This is a candidate for the 7.9 and 7.10 branches. This patch
also depends on the previous two commits.
When GCC encounters a division by zero in a preprocessor directive, it
generates an error. Since the GLSL spec says that the GLSL
preprocessor behaves like the C preprocessor, we should generate that
same error.
It's worth noting that I cannot find any text in the C99 spec that
says this should be an error. The only text that I can find is line 5
on page 82 (section 6.5.5 Multiplicative Opertors), which says,
"The result of the / operator is the quotient from the division of
the first operand by the second; the result of the % operator is
the remainder. In both operations, if the value of the second
operand is zero, the behavior is undefined."
Fixes 093-divide-by-zero.c test and bugzilla #32831.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
In _token_list_equal_ignoring_space(token_list_t*, token_list_t*), add
a guard that prevents dereferncing a null token list.
This fixes test src/glsl/glcpp/tests/092-redefine-macro-error-2.c and
Bugzilla #32695.
When attaching a small mipmap level to an FBO, the original gen4
didn't have the bits to support rendering to it. Instead of falling
back, just blit it to a new little miptree just for it, and let it get
revalidated into the stack later just like any other new teximage.
Bug #30365.
If we hit this path, we're level 1+ and the base level got allocated
as a single level instead of a full tree (so we don't match
intelObj->mt). This tries to recover from that so that we end up with
2 allocations and 1 validation blit (old -> new) instead of
allocations equal to number of levels and levels - 1 blits.
We don't need to worry about levels other than MaxLevel because we're
minifying -- the lower levels (higher detail) won't contribute to the
result. By changing BaseLevel, we forced hardware that doesn't
support BaseLevel != 0 to relayout the texture object.
This reverts commit 7ce6517f3a.
This reverts commit d60145d06d.
I was wrong about which generations supported baselevel adjustment --
it's just gen4, nothing earlier. This meant that i915 would have
never used the mag filter when baselevel != 0. Not a severe bug, but
not an intentional regression. I think we can fix the performance
issue another way.
Most _3DSTATE defines contain the command type, sub-type, opcode, and
sub-opcode (i.e. 0x7905). These, however, contain only the sub-opcode
(i.e. 0x05). Since they are inconsistent with the rest of the code and
nothing uses them, simply delete them.
The _3DOP and _3DCONTROL defines seemed similar, and were also unused.
From reading EXT_texture_sRGB and EXT_framebuffer_sRGB and interactions
with FBO I've found that swrast is converting the sRGB values to linear for
blending when an sRGB texture is bound as an FBO. According to the spec
and further explained in the framebuffer_sRGB spec this behaviour is not
required unless the GL_FRAMEBUFFER_SRGB is enabled and the Visual/config
exposes GL_FRAMEBUFFER_SRGB_CAPABLE_EXT.
This patch fixes swrast to use a separate Fetch call for FBOs bound to
SRGB and avoid the conversions.
v2: export _mesa_get_texture_dimensions as per Brian's comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With core mesa doing runtime API checks, GLES overlay is no longer
needed. Make --enable-gles-overlay equivalent to --enable-gles[12].
There may still be places where compile-time checks are done. They
could be fixed case by case.
This is a step forward for compatibility with really old GLX. But the
real reason for making this change now is so that we can make egl_glx a
built-in driver without having to link to libGL.
In preparation for making egl_dri2 built-in. It also handles
symbol lookup error: /usr/local/lib/egl/egl_dri2.so: undefined symbol:
_glapi_get_proc_address
more gracefully.
If you want to enable noop set GALLIUM_NOOP=1 as an env variable.
You need first to enable noop wrapping for your driver see change
to src/gallium/targets/dri-r600/ in this commit as an example.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add release function for texture_from_pixmap extension.
Some platform need to release texture image for texture_from_pixmap
extension, add this interface for those platforms.
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
There really shouldn't be any difference between the two for us.
Fixes a bug where Z16 renderbuffers would be untiled on gen6, likely
leading to hangs.
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
ARB_fbo no longer disallows mismatched width/height on attachments
(shouldn't be any problem), mixed format color attachments (we only
support 1), and L/A/LA/I color attachments (we already reject them on
965 too). It requires Gen'ed names (driver doesn't care), and adds
FramebufferTextureLayer (we don't do texture arrays). So it looks
like we're already in the position we need to be for this extension.
Bug #27468, #32381.
In general, we have to negate in immediate values we pass in because
the src1 negate field in the register description is in the bits3 slot
that the 32-bit value is loaded into, so it's ignored by the hardware.
However, the src0 negate field is in bits1, so after we'd negated the
immediate value loaded in, it would also get negated through the
register description. This broke this VP instruction in the position
calculation in civ4:
MAD TEMP[1], TEMP[1], CONST[256].zzzz, CONST[256].-y-y-y-y;
Bug #30156
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
Fix the error in uniform row calculating, it may alloc one line
more which may cause out of range on memory usage, sometimes program
aborted when free the memory.
NOTE: This is a candidate for 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <brianp@vmware.com>
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
Previously the 'STDGL invariant(all)' pragma added in GLSL 1.20 was
simply ignored by the compiler. This adds support for setting all
variable invariant.
In GLSL 1.10 and GLSL ES 1.00 the pragma is ignored, per the specs,
but a warning is generated.
Fixes piglit test glsl-invariant-pragma and bugzilla #31925.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
GLSL 1.10 and 1.20 allow any sort of sampler array indexing.
Restrictions were added in GLSL 1.30. Commit f0f2ec4d added support
for the 1.30 restrictions, but it broke some valid 1.10/1.20 shaders.
This changes the error to a warning in GLSL 1.10, GLSL 1.20, and GLSL
ES 1.00.
There are some spurious whitespace changes in this commit. I changed
the layout (and wording) of the error message so that all three cases
would be similar. The 1.10/1.20 and 1.30 text is the same. The only
difference is that one is an error, and the other is a warning. The
GLSL ES 1.00 wording is similar but not quite the same.
Fixes piglit test
spec/glsl-1.10/compiler/constant-expressions/sampler-array-index-02.frag
and bugzilla #32374.
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
The overhead of resource_create, transfer_inline_write, and resource_destroy
to upload constant data is very visible with some apps in sysprof, and
as such should be eliminated.
My approach uses a user buffer to pass a pointer to a driver. This gives
the driver the freedom it needs to take the fast path, which may differ
for each driver.
This commit addresses the same issue as Jakob's one that suballocates out
of a big constant buffer, but it also eliminates the copy to the buffer.
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
The map/unmap overhead can be significant even though there is no waiting on busy
buffers. There is simply a huge number of uploads.
This is a performance optimization for Torcs, a car racing game.
Directly include mtypes.h if a file uses a gl_context struct. This
allows future removal of headers that are not strictly necessary but
indirectly include mtypes.h for a file.
BaseLevel/MaxLevel are mostly used for two things: clamping texture
access for FBO rendering, and limiting the used mipmap levels when
incrementally loading textures. By restricting our mipmap trees to
just the current BaseLevel/MaxLevel, we caused reallocation thrashing
in the common case, for a theoretical win if someone really did want
just levels 2..4 or whatever of their texture object.
Bug #30366
This has always been ugly about our texture code -- object base/max
level vs intel object first/last level vs image level vs miptree
first/last level. We now get rid of intelObj->first_level which is
just tObj->BaseLevel, and make intelObj->_MaxLevel clearly based off
of tObj->_MaxLevel instead of duplicating its code (incorrectly, as
image->MaxLog2 only considers width/height and not depth!)
It was quite a mess by trying to do NULL renderbuffers and real
renderbuffers in the same function. This clarifies the common case of
real renderbuffers.
This makes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc
match
fbo-generatemipmap-formats GL_EXT_texture_compression_s3tc
and swrast in bad DXT1_RGBA alpha=0 handling, but it means we won't
unpack and repack someone's textures into uncompressed SARGB8 format.
It's just LUMINANCE, not LUMINANCE_ALPHA. Fixes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc assertion failure
when it tries to pack the L8 channels into LUMINANCE_ALPHA and wonders
why it's trying to do that.
From the r600 ISA:
Each ALU clause can lock up to four sets of constants
into the constant cache. Each set (one cache line) is
16 128-bit constants. These are split into two groups.
Each group can be from a different constant buffer
(out of 16 buffers). Each group of two constants consists
of either [Line] and [Line+1] or [line + loop_ctr]
and [line + loop_ctr +1].
For supporting more than 64 constants, we need to
break the code into multiple ALU clauses based
on what sets of constants are needed in that clause.
Note: This is a candidate for the 7.10 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
rc_inst_can_use_presub() wasn't checking for too many RGB sources in
Alpha instructions or too many Alpha sources in RGB instructions.
Note: This is a candidate for the 7.10 branch.
Perform this check in ast_declarator_list::hir().
From section 4.3.6 of the GLSL 1.30 spec:
"If a vertex output is a signed or unsigned integer or integer
vector, then it must be qualified with the interpolation
qualifier
flat."
Allow redeclaration of the following built-in variables with an
interpolation qualifier in language versions >= 1.30:
* gl_FrontColor
* gl_BackColor
* gl_FrontSecondaryColor
* gl_BackSecondaryColor
* gl_Color
* gl_SecondaryColor
See section 4.3.7 of the GLSL 1.30 spec.
We were looking at the current draw buffer instead to see whether the
depth/stencil combination matched. So you'd get told your framebuffer
was complete, until you bound it and went to draw and we decided that
it was incomplete.
The _ColorDrawBuffers is a piece of computed state that gets for the
current draw/read buffers at _mesa_update_state time. However, this
function actually gets used for non-current draw/read buffers when
checking if an FBO is complete from the driver's perspective. So,
instead of trying to just look at the attachment points that are
currently referenced by glDrawBuffers, look at all attachment points
to see if they're driver-supported formats. This appears to actually
be more in line with the intent of the spec, too.
Fixes a segfault in my upcoming fbo-clear-formats piglit test, and
hopefully bug #30278
Until we know how hw converts quads to polygon in beginning of
3D pipeline, for now unconditionally use last vertex convention.
Fix glean/clipFlat case.
When figuring out whether a renderbuffer should be used to set the
visual bits of an FBO, we were missing important baseformats like
GL_RED, GL_RG, and GL_LUMINANCE.
st/egl should be enabled with --enable-openvg even the driver is xlib or
osmesa. Also, GLX_DIRECT_RENDERING should not be defined because libdrm
is not checked.
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
The initial values in the grctx are 0x0000 anyway, and re-binding them
all to 0x0000 destroys some init done by the nouveau drm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This reverts commit de6fd527a5.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16.
Previously, shaders with control flow nested 17 levels deep would
cause a driver assertion or segmentation fault.
Gen6 (Sandybridge) hardware no longer has this restriction.
Fixes fd.o bug #31967.
This adds a new optional max_depth parameter (defaulting to 0) to
lower_if_to_cond_assign, and makes the pass only flatten if-statements
nested deeper than that.
By default, all if-statements will be flattened, just like before.
This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign,
to match the new naming conventions.
This gets my vote for most pointless extension of all time, I'm guessing
some driver could possibly optimise for this instead of counting it might
just get a true/false, but I'm not really sure.
need this to eventually advertise 3.3 despite its total uselessness.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rename MAPI_GLAPI_SOURCES to MAPI_UTIL_SOURCES. Rename macro
MAPI_GLAPI_CURRENT to MAPI_MODE_UTIL. Update the comments to make it
clear that mapi may be used in two ways and how.
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
No need to register function prototypes in the module now that we call
the C function pointer directly -- less LLVM objects lying around.
Limited testing with lp_test_format.
Even though a bound texture stays bound when calling set_fragment_sampler_views,
it must be assigned a new cache region depending on the occupancy of other
texture units.
This fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28800
Thanks to Álmos <aaalmosss@gmail.com> for finding the bug in the code.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
We need to keep using the pipe_get_tile_swizzle() even though there's
no swizzling because we need to explicitly pass in the surface format.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=32459
The only mismatch between the two is that we have to clear the
destination's alpha to 1.0. Fixes WOW performance on my Ironlake,
from a few frames a second to almost playable.
Before, we were going off of a couple of known (hopeful) matches
between internalFormats and the cpp of the read buffer. Instead, we
can now just look at the gl_format of the two to see if they match.
We should avoid bad blits that might have been possible before, but
also allow different internalFormats to work without having to
enumerate each one.
The blit that follows appears in the command stream so it's serialized
with previous rendering. Any queued vertices in the tnl layer were
already flushed up in mesa/main/.
Create a constant int pointer to the C function, then cast it to the
function's type. This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.
NOTE: This is a candidate for the 7.10 branch
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
This is the hack for input interactivity of frontbuffer rendering
(like we do for backbuffer at intelDRI2Flush()) by waiting for the n-2
frame to complete before starting a new one. However, for an
application doing multiple contexts or regular rebinding of a single
context, this would end up lockstepping the CPU to the GPU because
every unbind was considered the end of a frame.
Improves WOW performance on my Ironlake by 48.8% (+/- 2.3%, n=5)
Given a dispatch slot, entry_get_public returns the address of the
corresponding public entry point. There may be more than one of them.
But since they are all equivalent, it is fine to return any one of them.
With entry_get_public, the address of any public entry point can be
calculated at runtime when an assembly dispatcher is used. There is no
need to have a mapping table in such case. This omits the unnecessary
relocations from the binary.
Hidden entries are just like normal entries except that they are not
exported. Since it is not always possible to hide them, and two hidden
aliases can share the same entry, the name of hidden aliases are mangled
to '_dispatch_stub_<slot>'.
Split out function name generation from _c_decl to _c_function, and use
it everywhere. Add an optional 'export' argument to _cdecl. It is
prepended to the returned string.
Really no idea why I didn't see this before, but these values were opposite
the register spec.
this seems to fix rv530 HiZ on my laptop, will reenable in next commit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GL fragColor semantics we need to tell the pipe drivers that the fragment
shader color result is to be replicated to all bound color buffers, this
adds the basic TGSI + documentation.
v2: fix missing comma pointed out by Tilman on mesa-dev.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If 'start' is odd, render the first triangle with indices embedded
in the command stream, which adds 3 to 'start' and makes it even.
Then continue with the fast path.
This is a bandaid on the problem that if some formats were not renderable
(like luminance_alpha), st/mesa fell back to some RGBA format, so basically
some non-renderable formats were actually not used at all. This is only
a problem with hardware drivers, softpipe can render to anything.
Instead, require only RGB8/RGBA8 to be renderable.
Radeon GPUs can do this. R600 can even do render-to-texture.
Packing and extracting aren't implemented, but we shouldn't hit them (I think).
Tested with swrast, softpipe, and r300g.
Just like everywhere else, I never trust my constant uploads to
correctly put constants in the right places, even though that's so
rarely where the issue is.
It's mostly like gen4 message descriptor setup, except that the sizes
of type/control changed to be like gen5. Fixes 21 piglit cases on
gm45, including the regressions in bug #32311 from increased VS
constant buffer usage.
Fixes this GCC warning.
lp_bld_const.h: In function 'lp_build_const_int_pointer':
lp_bld_const.h:137: warning: cast from pointer to integer of different size
Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.
Fixes many cases' hang and failure in GL conformance test.
Set window_bit only when the visual id is greater than zero. Correct
visual types. Skip slow configs as they are not relevant. Finally, do
not return duplicated configs.
All single-buffered configs were ignored before to make sure
EGL_RENDER_BUFFER is settable for window surfaces. It is better to
allow single-buffered configs and set EGL_WINDOW_BIT only for
double-buffered ones. This way there can be single-buffered pixmaps.
The SNB alt-mode math does the denorm and inf reduction even for a
"raw MOV" like we do for g0 message header setup, where we are moving
values that aren't actually floats. Just use UD type, where raw MOVs
really are raw MOVs.
Fixes glxgears since c52adfc2e1, but no
piglit tests had regressed(!)
Can't get away from referencing upload buffer as after flush a vertex buffer
using the upload buffer might still be active. Likely need to simplify the
pipe_refence a bit so we don't waste so much cpu time in it.
candidates for 7.10 branch
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Instead of when we read texture tiles. Now swizzling happens after
the shadow depth compare step. This fixes the piglit glsl-fs-shadow2d*
tests (except for proj+bias because of a GLSL bug).
We need to swizzle after the shadow comparison so that the GL_DEPTH_MODE
functionality is handled properly.
This fixes all the piglit glsl-fs-shadow2d*.shader_test cases, except
for glsl-fs-shadow2dproj-bias.shader_test which fails because of a
bug in the GLSL compiler (fd.o 32395).
Note the support for non float vertex draw likely regressed need to
find what we want to do there.
candidates for 7.10 branches
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is still awful, but my ability to care about reworking the old
backend so we can just get a temporary value into a POW is awfully low
since the new backend does this all sensibly.
Fixes:
fp1-LIT test 1
fp1-LIT test 3 (case x < 0)
fp1-POW test (exponentiation)
fp-lit-mask
commit 4f106f44a32eaddb6cf3fea6ba5ee9787bff609a
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 14:06:08 2010 -0700
st/mesa: reorganize vertex program translation code
Now it looks like the fragment and geometry program code.
Also remove the serial number fields from programs. It was used to
determine when new translations were needed. Now the variant key is
used for that. And the st_program_string_notify() callback removes all
variants when the program's code is changed.
commit e12d6791c5e4bff60bb2e6c04414b1b4d1325f3e
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 13:38:12 2010 -0700
st/mesa: implement geometry shader varients
Only needed in order to support per-context gallium shaders.
commit c5751c673644808ab069259a852f24c4c0e92b9d
Author: Brian Paul <brianp@vmware.com>
Date: Sun Dec 12 15:28:57 2010 -0700
st/mesa: restore glDraw/CopyPixels using new fragment program variants
Clean up the logic for fragment programs for glDraw/CopyPixels. We now
generate fragment program variants for glDraw/CopyPixels as needed which
do texture sampling, pixel scale/bias, pixelmap lookups, etc.
commit 7b0bb99bab6547f503a0176b5c0aef1482b02c97
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 17:03:23 2010 -0700
st/mesa: checkpoint: implement fragment program variants
The fragment programs variants are per-context, as the vertex programs.
NOTE: glDrawPixels is totally broken at this point.
commit 2cc926183f957f8abac18d71276dd5bbd1f27be2
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 14:59:32 2010 -0700
st/mesa: make vertex shader variants per-context
Gallium shaders are per-context but OpenGL shaders aren't. So we need
to make a different variant for each context.
During context tear-down we need to walk over all shaders/programs and
free all variants for the context being destroyed.
The new GLSL compiler doesn't support geom shaders yet so disable the
GL_ARB_geometry_shader4 extension. Undo this when geom shaders work again.
NOTE: This is a candidate for the 7.10 branch.
This is the same as what the array dereference handler does.
Fixes piglit test glsl-link-struct-array (bugzilla #31648).
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The hardware supports zero stride just fine. This is a port
of 2af8a19831 from r300g.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The RS690 memory controller prefers things to be on a different
boundary than the discrete GPUs, we had an attempt to fix this,
but it still failed, this consolidates the stride calculation
into one place and removes the really special case check.
This fixes gnome-shell and 16 piglit tests on my rs690 system.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the driver is the last reference to libEGL.so, unloading it will
cause libEGL.so to be unmapped and give problems. Disable the unloading
for now. Still have to figure out the right timing to unload drivers.
The hardware apparently does support a zero stride, so let's use it.
This fixes missing objects in ETQW, but might also fix a ton of other
similar-looking bugs.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
If a source operand has a non-native swizzle (e.g. the KIL instruction
cannot have a swizzle other than .xyzw), the lowering pass uses one or more
MOV instructions to move the operand to an intermediate temporary with
native swizzles.
This commit fixes that the presubtract information was lost during
the lowering.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
This fixes broken rendering of trees in ETQW. The trees still disappear
for an unknown reason when they are close.
Broken since:
2ff9d4474b
r300/compiler: make lowering passes possibly use up to two less temps
NOTE: This is a candidate for the 7.10 branch.
do_assignment may apply implicit conversions to coerce the base type
of initializer to the base type of the variable being declared. Fixes
piglit test glsl-implicit-conversion-02 (bugzilla #32287). This
probably also fixes bugzilla #32273.
NOTE: This is a candidate for the 7.9 branch and the 7.10 branch.
There are exceptions to the table for depth texturing or rendering to
not-quite-supported formats thanks to the non-orthogonal component
selection for surface formats, but it's still a lot simpler.
Fixes piglit valgrind glsl-array-bounds-04 failure (FDO bug 29946).
NOTE:
This is a candidate for the 7.10 branch.
This is a candidate for the 7.9 branch.
The current state is allowed to be undefined past DrawElements et al.
Consequently omit that copying at least in the display list code.
This pays us some percents performance.
Signed-off-by: Brian Paul <brianp@vmware.com>
assert(current_save_state < MAX_META_OPS_DEPTH) did not compile.
Rename current_save_state to SaveStackDepth to be more consistent with
the style of the other fields.
VS places color attributes together so that SF unit can fetch the right
attribute according to object orientation. This fixes light issue in
mesa demo geartrain, projtex.
_mesa_meta_CopyPixels results in nested meta operations on Sandybridge.
Previoulsy the second meta operation overrides all states saved by the
first meta function.
When the application is not linked to any libGL*.so, loading st_GL.so
would give
/usr/local/lib/egl/st_GL.so: undefined symbol: _glapi_tls_Context
In that case, load libGL.so and try again. This works because
util_dl_open loads with RTLD_GLOBAL.
Fix "clear" OpenGL ES 1.1 demo.
Fixes piglit glx-shader-sharing crash.
When shaders are shared by multiple contexts, the shader's draw context
pointer may point to a previously destroyed context. Dereferencing the
context pointer will lead to a crash.
In this case, simply removing the flushing code avoids the crash (the
exec and sse shader paths don't flush here either).
There's a deeper issue here, however, that needs examination. Shaders
should not keep pointers to contexts since contexts might get destroyed
at any time.
NOTE: This is a candidate for the 7.10 branch (after this has been
tested for a while).
System values are shader inputs which don't necessarily change from
vertex to vertex or fragment to fragment. gl_InstanceID and
gl_FrontFacing are examples.
Known issues in the ARB_color_buffer_float implementation:
- Rendering to multiple render targets, some fixed-point, some floating-point, with FIXED_ONLY fragment clamping and polygon smooth enabled may write incorrect values to the fixed point buffers (depends on spec interpretation)
- For fragment programs with ARB_fog_* options, colors are clamped before fog application regardless of the fragment clamping setting (this depends on spec interpretation)
<li>Assorted Mesa/Gallium state tracker bug fixes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26795">Bug 26795</a> - gl_FragCoord off by one in Gallium drivers.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29164">Bug 29164</a> - [GLSL 1.20] invariant variable shouldn't be used before declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29927">Bug 29927</a> - [glsl2] fail to compile shader with constructor for array of struct type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30156">Bug 30156</a> - [i965] After updating to Mesa 7.9, Civilization IV starts to show garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31923">Bug 31923</a> - [GLSL 1.20] allowing inconsistent centroid declaration between two vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32214">Bug 32214</a> - [gles2]no link error happens when missing vertex shader or frag shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32375">Bug 32375</a> - [gl gles2] Not able to get the attribute by function glGetVertexAttribfv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32541">Bug 32541</a> - Segmentation Fault while running an HDR (high dynamic range) rendering demo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32569">Bug 32569</a> - [gles2] glGetShaderPrecisionFormat not implemented yet</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32831">Bug 32831</a> - [glsl] division by zero crashes GLSL compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32910">Bug 32910</a> - Keywords 'in' and 'out' not handled properly for GLSL 1.20 shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33316">Bug 33316</a> - uniform array will be allocate one line more and initialize it when it was freed will abort</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33386">Bug 33386</a> - Dubious assembler in read_rgba_span_x86.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33388">Bug 33388</a> - Dubious assembler in xform4.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33433">Bug 33433</a> - Error in x86-64 API dispatch code.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33507">Bug 33507</a> - [glsl] GLSL preprocessor modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33508">Bug 33508</a> - [glsl] GLSL compiler modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33916">Bug 33916</a> - Compiler accepts reserved operators % and %=</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34030">Bug 34030</a> - [bisected] Starcraft 2: some effects are corrupted or too big</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34047">Bug 34047</a> - Assert in _tnl_import_array() when using GLfixed vertex datatypes with GLESv2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34114">Bug 34114</a> - Sun Studio build fails due to standard library functions not being in global namespace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34179">Bug 34179</a> - Nouveau 3D driver: nv50_pc_emit.c:863 assertion error kills Compiz</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34198">Bug 34198</a> - [GLSL] implicit sized array with index 0 used gets assertion</li>
<li><ahref="https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/691653">Ubuntu bug 691653</a> - compiz crashes when using alt-tab (the radeon driver kills it) </li>
<li><ahref="https://bugs.meego.com/show_bug.cgi?id=13005">Meego bug 13005</a> - Graphics GLSL issue lead to camera preview fail on Pinetrail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34370">Bug 34370</a> - [GLSL] "i<5 && i<4" in for loop fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34374">Bug 34374</a> - [GLSL] fail to redeclare an array using initializer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35073">Bug 35073</a> - [GM45] Alpha test is broken when rendering to FBO with no color attachment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35483">Bug 35483</a> - util_blit_pixels_writemask: crash in line 322 of src/gallium/auxiliary/util/u_blit.c</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33512">Bug 33512</a> - [SNB] case ogles2conform/GL/gl_FragCoord/gl_FragCoord_xy_frag.test and gl_FragCoord_w_frag.test fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34280">Bug 34280</a> - r200 mesa-7.10 font distortion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34321">Bug 34321</a> - The ARB_fragment_program subset of ARB_draw_buffers not implemented</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36410">Bug 36410</a> - [SNB] Rendering errors in 3DMMES subtest taiji</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36527">Bug 36527</a> - [wine] Wolfenstein: Failed to translate rgb instruction.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36651">Bug 36651</a> - mesa requires bison and flex to build but configure does not check for them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30694">Bug 30694</a> - wincopy will crash on Gallium drivers when going to front buffer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30787">Bug 30787</a> - Invalid asm shader does not generate draw-time error when used with GLSL shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31101">Bug 31101</a> - [glsl2] abort() in ir_validate::visit_enter(ir_assignment *ir)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31193">Bug 31193</a> - [regression] aa43176e break water reflections</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31194">Bug 31194</a> - The mesa meta save/restore code doesn't ref the current GLSL program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31514">Bug 31514</a> - isBuffer returns true for unbound buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31560">Bug 31560</a> - [tdfx] tdfx_tex.c:702: error: ‘const struct gl_color_table’ has no member named ‘Format’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31617">Bug 31617</a> - Radeon/Compiz: 'failed to attach dri2 front buffer', error case not handled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31650">Bug 31650</a> - [GLSL] varying gl_TexCoord fails to be re-declared to different size in the second shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31673">Bug 31673</a> - GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31690">Bug 31690</a> - i915 shader compiler fails to flatten if in Aquarium webgl demo.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31832">Bug 31832</a> - [i915] Bad renderbuffer format: 21</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31985">Bug 31985</a> - [GLSL 1.20] initialized uniform array considered as "unsized"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31987">Bug 31987</a> - [gles2] if input a wrong pname(GL_NONE) to glGetBoolean, it will not case GL_INVALID_ENUM</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32311">Bug 32311</a> - [965 bisected] Array look-ups broken on GM45</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32520">Bug 32520</a> - [gles2] glBlendFunc(GL_ZERO, GL_DST_COLOR) will result in GL_INVALID_ENUM</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32825">Bug 32825</a> - egl_glx driver completely broken in 7.9 branch [fix in master]</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-7.9..mesa-7.9.1
</pre>
<p>Alex Deucher (5):
<ul>
<li>r100: revalidate after radeon_update_renderbuffers</li>
<li>r600c: add missing radeon_prepare_render() call on evergreen</li>
<li>r600c: properly align mipmaps to group size</li>
<li>gallium/egl: fix r300 vs r600 loading</li>
<li>r600c: fix some opcodes on evergreen</li>
</ul></p>
<p>Aras Pranckevicius (2):
<ul>
<li>glsl: fix crash in loop analysis when some controls can't be determined</li>
<li>glsl: fix matrix type check in ir_algebraic</li>
</ul></p>
<p>Brian Paul (27):
<ul>
<li>swrast: fix choose_depth_texture_level() to respect mipmap filtering state</li>
<li>st/mesa: replace assertion w/ conditional in framebuffer invalidation</li>
<li>egl/i965: include inline_wrapper_sw_helper.h</li>
<li>mesa: Add missing else in do_row_3D</li>
<li>mesa: add missing formats in _mesa_format_to_type_and_comps()</li>
<li>mesa: handle more pixel types in mipmap generation code</li>
<li>mesa: make glIsBuffer() return false for never bound buffers</li>
<li>mesa: fix glDeleteBuffers() regression</li>
<li>swrast: init alpha value to 1.0 in opt_sample_rgb_2d()</li>
<li>meta: Mask Stencil.Clear against stencilMax in _mesa_meta_Clear</li>
<li>st/mesa: fix mapping of zero-sized buffer objects</li>
<li>Assorted Mesa/Gallium state tracker bug fixes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26795">Bug 26795</a> - gl_FragCoord off by one in Gallium drivers.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29164">Bug 29164</a> - [GLSL 1.20] invariant variable shouldn't be used before declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29927">Bug 29927</a> - [glsl2] fail to compile shader with constructor for array of struct type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30156">Bug 30156</a> - [i965] After updating to Mesa 7.9, Civilization IV starts to show garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31923">Bug 31923</a> - [GLSL 1.20] allowing inconsistent centroid declaration between two vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32214">Bug 32214</a> - [gles2]no link error happens when missing vertex shader or frag shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32375">Bug 32375</a> - [gl gles2] Not able to get the attribute by function glGetVertexAttribfv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32541">Bug 32541</a> - Segmentation Fault while running an HDR (high dynamic range) rendering demo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32569">Bug 32569</a> - [gles2] glGetShaderPrecisionFormat not implemented yet</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32831">Bug 32831</a> - [glsl] division by zero crashes GLSL compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32910">Bug 32910</a> - Keywords 'in' and 'out' not handled properly for GLSL 1.20 shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33316">Bug 33316</a> - uniform array will be allocate one line more and initialize it when it was freed will abort</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33386">Bug 33386</a> - Dubious assembler in read_rgba_span_x86.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33388">Bug 33388</a> - Dubious assembler in xform4.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33433">Bug 33433</a> - Error in x86-64 API dispatch code.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33507">Bug 33507</a> - [glsl] GLSL preprocessor modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33508">Bug 33508</a> - [glsl] GLSL compiler modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33916">Bug 33916</a> - Compiler accepts reserved operators % and %=</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34047">Bug 34047</a> - Assert in _tnl_import_array() when using GLfixed vertex datatypes with GLESv2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34114">Bug 34114</a> - Sun Studio build fails due to standard library functions not being in global namespace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34198">Bug 34198</a> - [GLSL] implicit sized array with index 0 used gets assertion</li>
<li><ahref="https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/691653">Ubuntu bug 691653</a> - compiz crashes when using alt-tab (the radeon driver kills it) </li>
<li><ahref="https://bugs.meego.com/show_bug.cgi?id=13005">Meego bug 13005</a> - Graphics GLSL issue lead to camera preview fail on Pinetrail</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.