If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A few GLES2 tests tripped over this when using array dereferences to
hit channels on the LHS (see piglit test
glsl-copy-propagation-vector-indexing). We wouldn't find the
ir_dereference_variable, and assume that that meant that it wasn't an
assignment to a scalar/vector, and thus not notice that the variable
had been changed.
Release the old depth region and reference the new one *only* if it has
changed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@intel.com>
Prior to Gen6, we use the GS for breaking down quads, quad-strips,
and line loops. On Gen6, earlier stages already take care of this,
so we never need the GS.
Since this code is likely completely untested, remove it for now.
We can write new code when enabling real geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit b4cbd2b312.
It looked like a safe sanity check. It missed the issue of the start of
the buffer not being at 0, but even that was not enough to explain why
setting the max vertex index caused glyphs to be dropped from the game
'Achron'.
Instead, the issue appears to be related to the use of the vertex bias
and so we would need to re-emit the max-index every time we adjusted the
bias, so re-emitting the relocations and defeating the original
optimisation.
Reported-and-tested-by: Thomas Jones <thomas.jones@utoronto.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35163
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the window is minimized GetClientRect will return zeros.
Instead of creating a 1x1 framebuffer, simply preserve the current window
size, until the window is restored or maximized again.
The earlier change to ensure rendertargets and textures are always
rebound at every command buffer start had the downside of making
successive flushes no longer no-ops, as a command buffer with merely
the rebinding commands were being unnecessarily sent to the vGPU.
This change only re-emits the bindings when necessary, by keeping track
of the need to rebind outside of the dirty state update mechanism.
According to https://bugs.freedesktop.org/show_bug.cgi?id=34280
commit 5d1387b2da causes the font corruption
problems people have been seeing under various apps and gnome-shell on r200
cards.
This commit changed (loosened) the check for using the memcpy path in the
former al88 / al1616 texstore functions, which are now also used to
store rg texures. This patch restores the old strict check in case of
al textures. I've no idea why this fixes things, since I don't know the
code in question at all. But after seeing the bisect in bfdo34280 point
to this commit, I gave this fix a try and it fixes the font issues seen on
r200 cards.
[airlied:
r200 has no native working A8, so it does an internal storage format of AL88
however srcFormat == internalFormat == ALPHA when we get to this point,
so it copies, but it wants to store into an AL88 not ALPHA so fails,
I'll also push a piglit test for this on r200].
Many thanks to Nicolas Kaiser who did all the hard work of tracking this down!
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also make the GL_ARB_draw_instanced block follow the same pattern as
the other blocks.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a 49.6% +/- 2.0% (n=9, IPS outlier removed) performance
improvement for the hacked-up-for-cache-misses scissor-many, and no
statistically significant performance difference for the
hacked-up-for-cache-hits version (n=9, IPS outlier removed). No
statistically significant performance difference from ETQW (n=5) from
these last two commits.
This is a 28.1% +/- 1.4% (n=10) performance improvement for the
hacked-up-for-cache-misses scissor-many (n=10), and no statistically
significant wall-time performance difference for the
hacked-up-for-cache-hits version (n=9, first outlier in each removed
since IPS was warming up. User time increased by about 4.7%, but
kernel time decreased equivalently).
Build the new sources, plug the new functions into the dispatch table,
implement display list support. And enable extension in the gallium
state tracker.
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
Fixes issues in e.g. nexuiz (desertfactory) or supertuxkart that
look like lighting bugs.
They're not visible with the software rasterizers because their
notion of linear interpolation seems to be different from that
of nv50/nvc0.
This reverts commit 437c748bf5.
The commit is wrong for several reasons. One of them is when we grab
a new buffer, we should update all the states it is bound in,
including all parallel contexts. I don't think this is even doable.
The correct solution would be upload data via a temporary buffer and
do resource_copy_region to the original one.
https://bugs.freedesktop.org/show_bug.cgi?id=36088
Add Cygwin platform-specific settings and drivers to build for dri driver:
- by default, disable direct rendering.
- if direct rendering is enabled, the swrast dridriver is the only one it's
sensible to try to build (this doesn't work at the moment as additional patches
are required to build a libGL which can load just swrast without the DRM headers,
even though there's no actual functional dependency)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Fix build when configured --with-driver=dri --disable-driglx-direct on targets
without drm e.g. GNU/Hurd and Cygwin
Based on the Debian patch file '05_hurd-ftbfs.diff' by Samuel Thibault.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-By: Jakob Bornecrantz <wallbraker@gmail.com>
The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.
Fixes glsl-fs-loop-redundant-condition.
Otherwise min_lod can potentially be larger than the clamped max_lod. The
code that follows will swap min_lod and max_lod in that case, resulting in a
max_lod larger than MAX_LEVEL.
Signed-off-by: Brian Paul <brianp@vmware.com>
Base level and min LOD aren't equivalent. In particular, min LOD has no
effect on image array selection for magnification and non-mipmapped
minification.
Signed-off-by: Brian Paul <brianp@vmware.com>
Although for GL a zero stride means tightly packed elements, Mesa
internally uses zero strides for constant arrays.
Therefore user buffers need to be defined from
buffer_offset + src_offset + min_index*stride
to
buffer_offset + src_offset + max_index*stride + elem_size
Simplifying the later with (max_index + 1)*stride will give zero
sized buffers.
This change also aggregates the st_context's info about user buffers
into a single array.
We adjust 'end' to fit into _MaxElement, but that may result into a 'start'
value bigger than 'end' being passed downstream, causing havoc.
This could be seen with arb_robustness_draw-vbo-bounds, due to an
application bug.
Because:
- bindings are not fully automatic, and they are broken most of the time
- unit tests/samples can be written in C on top of graw
- tracing/retracing is more useful at API levels with stable ABIs such as
GL, producing traces that cover more layers of the driver stack and and
can be used for regression testing
There's really no need for a prefix on member functions, and overloading
takes care of the _op1/_op2 distinction quite nicely. Eric already made
a similar change in the i965 FS backend.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than ir_to_mesa_dst_reg_from_src and ir_to_mesa_src_reg_from_dst.
The new constructors are marked 'explicit' so that the compiler can
catch cases where source and destination registers were accidentally
interchanged.
This also necessitated using constructors to initialize the undef and
address registers, as well as adding a default constructor.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both classes are completely private to ir_to_mesa.cpp, so there won't be
any name conflicts with other parts of Mesa. The prefix simply makes it
harder to read.
Also, use a class rather than typedef structs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is in preparation from removing the "ir_to_mesa_" prefix on the
src_reg and dst_reg types, which would cause a naming conflict.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code would previously handle the projection, then swizzle the
shadow comparitor into place. However, when the projection is done
"by hand," as in the TXB case, the unprojected shadow comparitor would
over-write the projected shadow comparitor.
Shadow comparison with projection and LOD is an extremely rare case in
real application code, so it shouldn't matter that we don't handle
that case with the greatest efficiency.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=32395
This matches the behaviour below when numSamples is compared.
At least with the gallium state tracker this can actually occur if st_render_texture fails.
Signed-off-by: Brian Paul <brianp@vmware.com>
This is always the way with real hardware and desktop OpenGL. Some
hardware can't do some formats natively. The alpha-only, luminance,
and intensity formats are usually the most problematic. Some sized
formats can also be problematic. This patch provides fall-back
formats for those that are not natively supported.
At some point it would be interesting to try providing
device-independent conversions using EXT_texture_swizzle. The drivers
that support EXT_texture_swizzle could, for example, see
GL_LUMINANCE16_SNORM as MESA_FORMAT_SIGNED_R16 with a { r, r, r, 1 }
swizzle. Care would need to be taken to prevent issues with using
those textures for FBO rendering.
This is the rest of the fix for glean's pixelFormats test on i965.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This class of hardware can natively sample all of the snorm surface
formats that DX10 requires, but it can't do some of the legacy GL
formats. In particular, all of the alpha, luminance, and intensity
formats are unsupported.
This partially fixes the breakage in glean's pixelFormats test since
GL_EXT_texture_snorm support was added to Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use this format to represent the accum buffer. No snorm texture
sampling or rendering takes place.
Fixes failed assertion with swrast and any app using the accum buffer
(and glxinfo).
Piglit tests:
- glsl-fs-shadow2d-01
- glsl-fs-shadow2d-02
- glsl-fs-shadow2d-03
- fs-shadow2d-red-01
- fs-shadow2d-red-02
- fs-shadow2d-red-03
NOTE: This is a candidate for the stable branches.
Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.
The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.
NOTE: This is a candidate for the 7.10 branch.
Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
running gnome-shell or compiz (Mesa bugs #35820 and #35853).
I incorrectly refactored the case that dealt with ARF_NULL; even in that
case, the source register needs to be changed to the MRF.
NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
cherry-picked, take this one too).
Branch emulation and loop unrolling are done in the GLSL frontend.
Transforming loops is no longer needed for fragment shaders, but it is still
necessary for vertex shaders.
Registers that are used inside of loops need to be considered live
starting with the first instruction of the outermost loop.
https://bugs.freedesktop.org/show_bug.cgi?id=34370
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Oops, the mask was being used in the loop to determine whether to use
include the stencil || depth values. This began to fail when mask was
cleared at the beginning of the loop. So reorder the tests and do the
work up-front along with determining the depth_stencil value to use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35822
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(some little changes by Marek Olšák)
Squashed commit of the following:
commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Fri Aug 27 02:13:57 2010 +0200
draw: disable SSE and PPC paths (use LLVM instead)
These paths don't support vertex clamping, and are anyway
obsoleted by LLVM.
If you want to re-enable them, add vertex clamping and test that it
works with the ARB_color_buffer_float piglit tests.
commit fed3486a7ca0683b403913604a26ee49a3ef48c7
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:27:38 2010 +0200
draw_llvm: respect vertex color clamp
commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:26:43 2010 +0200
draw: respect vertex clamping in interpreter path
Macro can lead to hard to debug list bugs. For instance consider
the following :
LIST_ADD(item, list->prev)
3 instruction of the macro became :
(list->prev)->next->prev = item
which is equivalent to :
list->prev = item
Thus list prev field changes and next instruction in the macro
(list->prev)->next = item
became :
item->next = item
And you endup with list corruption, other case lead to similar
list corruption. Inline function are not affected by this short
coming
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When the condition
min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])
was true, we were wrongly ignoring istart and processing indices 0.
Reorder some statements to make the code easier to understand.
Now that we purposefully generate delta that point outside of the target
buffer, the assertion has outlived its usefulness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more! This time without the unwarranted conversion from
drm_intel_bo_alloc_tiled.
Signed-off-by: [a very embarrassed] Chris Wilson <chris@chris-wilson.co.uk>
v2: Allocate the fences from a single shared buffer object.
v3: Allocate the r600_fence structs in blocks of 16.
Spin a few times before calling sched_yield in r600_fence_finish().
This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.
This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This array is going to be used in the main compiler soon. Leaving
them uniforms.c caused problems for building the stand-alone compiler.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The hardware should be set according to this table:
FORMAT -> R300 COLORFORMAT
-------------------------
X16 -> UV88
X16Y16 -> ARGB8888
X32 -> ARGB8888
X32Y32 -> ARGB16161616
US_OUT_FMT must contain the real format.
I wasn't able to make B3G3R2 and L4A4 work, but those aren't important.
The vertex color clamp control is a property of an API,
a lot like gl_rasterization_rules.
The state should be set according to the API being implemented, for example:
OpenGL Compatibility: enabled by default
OpenGL Core: disabled by default
D3D11: always disabled
This patch also changes the way ARB_color_buffer_float is advertised.
If no SNORM or FLOAT render target is supported, fragment color clamping
is not required.
BTW this changes the gallium interface.
Some rather cosmetic changes by Marek.
Squashed commit of the following:
commit 513b37d484f0318311e84bb86ed4c93cdff71f13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:54 2010 +0200
mesa/st: respect fragment clamping in st_DrawPixels
commit 546a31e42cad459d7a7a10ebf77fc5ffcf89e9b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:28 2010 +0200
mesa/st: support fragment and vertex color clamping
commit c406514a1fbee6891da4cf9ac3eebe4e4407ec13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:56:37 2010 +0200
mesa/st: expose ARB_color_buffer_float if unclamping is supported
commit d0c5ea11b6f75f3da2f4ca989115f150ebc7cf8d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:53:41 2010 +0200
mesa/st: use unclamped colors
This assumes that Gallium is to be interpreted as given drivers the
responsibility to clamp these colors if necessary.
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it. (see the removed XXX comment. -Marek)
TODO: did I get the set of operations mandating it right?
commit 76bdfcfe3ff4145a1818e6cb6e227b730a5f12d8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:18:25 2010 +0200
gallium: add color clamping to the interface
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: fix getteximage so that it doesn't clamp values
mesa: update the compute_version function
mesa: add display list support for ARB_color_buffer_float
mesa: fix glGet query with GL_ALPHA_TEST_REF and ARB_color_buffer_float
commit b2f6ddf907935b2594d2831ddab38cf57a1729ce
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 31 16:50:57 2010 +0200
mesa: document known possible deviations from ARB_color_buffer_float
commit 5458935be800c1b19d1c9d1569dc4fa30a97e8b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:54:56 2010 +0200
mesa: expose GL_ARB_color_buffer_float
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
(I'll squash the st/mesa part to a separate commit. -Marek)
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it.
TODO: did I get the set of operations mandating it right?
commit 3a9cb5e59b676b6148c50907ce6eef5441677e36
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:09:41 2010 +0200
mesa: respect color clamping in texenv programs (v2)
Changes in v2:
- Fix attributes other than vertex color sometimes getting clamped
commit de26f9e47e886e176aab6e5a2c3d4481efb64362
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:05:53 2010 +0200
mesa: restore color clamps on glPopAttrib
commit a55ac3c300c189616627c05d924c40a8b55bfafa
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:04:26 2010 +0200
mesa: clamp color queries if and only if fragment clamping is enabled
commit 9940a3e31c2fb76cc3d28b15ea78dde369825107
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 00:00:16 2010 +0200
mesa: introduce derived _ClampXxxColor state resolving FIXED_ONLY
To do this, we make ClampColor call FLUSH_VERTICES with the appropriate
_NEW flag.
We introduce _NEW_FRAG_CLAMP since fragment clamping has wide-ranging
effects, despite being in the Color attrib group.
This may be easily changed by s/_NEW_FRAG_CLAMP/_NEW_COLOR/g
commit 6244c446e3beed5473b4e811d10787e4019f59d6
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:58:24 2010 +0200
mesa: add unclamped color parameters
Fixes piglit test glsl-vs-arrays-3 on Sandybridge, as well as garbage
rendering in 3DMarkMobileES 2.0's Taiji demo and GLBenchmark 2.0's
Egypt and PRO demos.
NOTE: This a candidate for stable release branches. It depends on
commit 9a21bc6401.
This was open-coded in three different places, and more are necessary.
Extract this into a function so it can be reused.
Unfortunately, not all variations were the same: in particular, one set
compression control and checked that the source register was not
ARF_NULL. This seemed like a good idea, so all cases now do so.
Eventually the miptree refcounting interface should be cleaned up.
The assymmetry dramatically increases the probability of bugs like
this. It should be made to like like libdrm refcounting or the
refcounting style used in other parts of Mesa.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, this ensures things like the front buffer actually exist. This
fixes piglt fbo/fbo-sys-blit and fd.o bug 35483.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
GLSL 1.30 states clearly that only float and int are allowed, while the
GLSL ES specification's issues section states that sampler types may
take precision qualifiers.
Fixes compilation failures in 3DMarkMobileES 2.0 and GLBenchmark 2.0.
NOTE: This is a candidate for stable release branches.
Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.
Found by inspection (comparing the old and new FS backend code).
References: https://bugs.freedesktop.org/show_bug.cgi?id=32949
NOTE: This is a candidate for the 7.10 branch.
Since GLSL IR allows multiple ir_variables to share the same name, we
need to generate unique names when printing the IR. Previously, we
always used %s@%p, appending the ir_variable's memory address.
While this worked, it had two drawbacks:
- When there aren't duplicates, the extra "@0x669a3e88" is useless
and makes the code harder to read.
- Real duplicates were hard to tell apart:
channel_expressions@0x6699e3c8 vs. channel_expressions@0x6699ddd8
We now append @2, @3, @4, and so on, but only where necessary to
distinguish duplicates. Since we only do this at print time, any
performance impact is irrelevant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
While making some other changes in this area I was finding it annoying
each of these functions took mostly the same set of parameters in
differing orders.
'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
We were using typecasts because the functions pointers are polymorphic
in the second argument type, which.
Surprisingly the wrong calling convention didn't cause crashes on Windows,
but it was causing certain registers to be trashed in MSVC optimized
builds, when processing callists in the ClearView RC Flight Simulator.
I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
If set to year X, only report extensions up to that year. This is a
work-around for games that try to copy the extensions string to a fixed
size buffer and overflow. If a game was released in year X, setting
MESA_EXTENSION_MAX_YEAR to that year will likely fix the problem.
We now use a 4-bit writemask for all instruction types, which makes it
easier to write generic helper functions to manipulte writemasks.
NOTE: This is a candidate for the 7.10 branch.
The code no longer supports otherwise -- it relies on buffers being
uploaded via u_upload_mgr -- so make this clear.
Also, there's no need to flush after draws from user buffers, given all
user content should have been copied by then.
Unsigned long is 32bit on several platforms (e.g., Windows), yielding
1UL << 32 to be zero.
Note that BITFIELD64_BIT result is often assigned to variables of type
GLbitfield, instead of GLbitfield64. That's probably wrong and should be
addressed in a later change.
It should have been a tip when the spec says "However, implicitly
sized arrays cannot be assigned to. Note, this is a rare case that
*initializers and assignments appear to have different semantics*."
(empahsis mine)
Fixes bugzilla #34367.
NOTE: This is a candidate for stable release branches.
Early Z support is set in the DST_VARS command. Hence split up static
state emission to avoid reissuing to much on fragment shader changes,
especially the costly dst buffer relocations.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The optimization loop won't reinsert noise instructions or quadop
vectors, so we were traversing the tree for nothing. Lowering vector
indexing was in the loop after do_common_optimization() to avoid the
work if it ended up that the index was actually constant, but that has
been called already in the core.
Most of the time we don't have a non-uniform struct variable in the
shader, so this cuts the time spent in do_structure_splitting during
glean texCombine by about 2/3.
This fixes an issue where the .obj files wound up in the src/
directory rather than the build/ directory. That prevented
combined 32-bit and 64-bit builds from working.
Signed-off-by: Brian Paul <brianp@vmware.com>
Additionally, to discarding the whole buffer, use
PIPE_TRANSFER_DISCARD_RANGE in pipe_buffer_write when the
write covers only part of the buffer.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In memory mapping buffer objects make use of
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE and PIPE_TRANSFER_DISCARD_RANGE
when appropriate.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It doesn't support them. Also, we shouldn't be
emitting CB_BLENDx_CONTROL on R600 as the regs don't
exist there, but I'm not sure of the best way to deal
with this in the current r600 winsys.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This packet is required when updating the DB, CB,
or STRMOUT base addresses on rv6xx for the surface
sync logic to work correctly.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This sort of worked because blend state setup cleared MULTIWRITE_ENABLE again,
but that's not something we want to depend on.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Disable Z_EXPORT / STENCIL_EXPORT / KILL_ENABLE again if a shader doesn't
use those. This is similar to 0a6f09a76a.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The idea behind this is that anything touching registers should be in
r600_state.c or evergreen_state.c. This is also consistent with
evergreen_pipe_shader_vs().
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
We need more and more of these, and it is difficult and prone to version
incompatability issues trying to single out every one of them.
This mimicks what was done in SCons.
The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered
dead by the reassignment on line 392.
Signed-off-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
This was probably missed when implementing luminance and luminance alpha
render targets.
_mesa_get_format_bits checks for both GL_*_BITS and GL_TEXTURE_*_SIZE.
This fixes:
main/framebuffer.c:892: _mesa_source_buffer_exists: Assertion `....' failed.
The r300 compiler can eliminate unused uniforms and remap uniform locations
if their number surpasses hardware limits, so the limit is actually
NumParameters + NumUnusedParameters. This is important for some apps
under Wine to run.
Wine sometimes declares a uniform array of 256 vec4's and some Wine-specific
constants on top of that, so in total there is more uniforms than r300 can
handle. This was the main motivation for implementing the elimination
of unused constants.
We should allow drivers to implement fail & recovery paths where it makes
sense, so giving up too early especially when comes to uniforms is not
so good idea, though I agree there should be some hard limit for all drivers.
This patch fixes:
- glsl-fs-uniform-array-5
- glsl-vs-large-uniform-array
on drivers which can eliminate unused uniforms.
Avoid setting the same gpu register several times in a r600_pipe_state.
Compute the final value of the register and set that one time. This avoids
some overhead in r600_context_pipe_state_set().
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Well, not sure what exactly it is, but it certainly doesn't contain
the control flow stack, but vertex data.
Not sure about size, I've only seen the first few KiB written, but
the binary driver seems to allocate more.
Depth values are also written before the shader is executed, so if
early tests are enabled, fragments that failed the alpha test were
modifying the depth buffer, but they shouldn't.
The kernel drm takes care of all coherency as long as we don't forget
to submit all outstanding commands in the batchbuffer ...
Also move batchbuffer initialization up because otherwise transfers
for some helper textures fail with a segmentation fault.
And kill the dead code, flushes should now be correct everywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The docs say it can be set for direct texture lookups, but even that
causes problems.
This fixes the wireframe bug:
https://bugs.freedesktop.org/show_bug.cgi?id=32688
NOTE: This is a candidate for the 7.9 and 7.10 branches.
(1, -_, ...) was converted to (-1, ...) because of the negation
in the second component.
Masking out the unused bits fixes this.
Piglit:
- glsl-fs-texture2d-branching
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This gets one more piece of the pipeline onto the new codegen backend.
Once ARB_fragment_program can generate GLSL programs, we can nuke the
old backend.
This is like how we track FragmentProgram._Current for the computed
ARB fragment program for fixed function texenv, but this gives direct
access to the gl_shader_program for drivers to codegen from, skipping
ARB_fp.
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
It would be nice if we handled optimized uniform math like this in
some generic way, since people often end up doing uniform expressions
in shaders, but for now keep this hard-coded like it was in the
texenvprogram code.
For fixed function fragment processing in GLSL IR, we want to be able
to reference this state value. gl_* not explicitly permitted is
reserved, so using this variable name internally shouldn't be any
issue.
This is used in the upcoming fixed function shader_program generation,
and shader_program and ARB programs are together in this code until
both fragment and vertex ff get converted.
The drivers have been changed so that they behave as if all of the flags
were set. This is already implicit in most hardware drivers and required
for multiple contexts.
Some state trackers were also abusing the PIPE_FLUSH_RENDER_CACHE flag
to decide whether flush_frontbuffer should be called.
New flag ST_FLUSH_FRONT has been added to st_api.h as a replacement.
Since we're compiling/linking GLSL shaders we should check against
the shader uniform limits, not the legacy vertex/fragment program
parameter limits which are usually lower.
The gl_program_constants struct is for limits that are applicable to
any/all shader stages. Move the geometry shader-only fields into the
gl_constants struct.
Remove redundant MaxGeometryUniformComponents field too.
Need to flush rendering (or at least indicate that the rug might be getting
pulled out from underneath us) when a shader, buffer object or query object
is about to be deleted.
Also, this helps to tell the VBO module to unmap its current vertex buffer.
Benefits:
- spares us a relocation.
- needed for zone rendering (if that ever happens).
- just awesome.
v2: Rename the debug option. Completely disabling the blitter is
required for Y tiling to work, so this option will cover other
code paths in the future.
v3: Implement suggestions by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Flushing the batch/hw backend doesn't invalidate the derived state.
So kill the unnecessary function calls and add an assert in
emit_hardware_state for paranoia.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the new clear code this is possible (if the app call glClear
before drawing the first primitive).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The polygon stipple fallback does not have to be implemented in the
draw module (it doesn't need window coords, etc). Drivers can use
this utility and avoid sw vertex fallbacks if pstipple is enabled.
Note: this is WIP and not used by any driver yet.
The gc var would be NULL if called from line 238. Instead, get
the opcode from __glXSetupForCommand(dpy) as done in other places.
And remove the unused gc parameter.
Fixes a bug reported by "John Doe" on 3/9/2011.
NOTE: This is a candidate for the 7.10 branch.
This fd gets passed in from outside, closing it causes the X.org server
to crap out when the driver doesn't identify the chipset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
Add linear probing on collisions.
Expand entry array by a fixed scale (currently 2) to help avoid
collisions.
Use a LRU approach to ensure that the number of entries stored in the
cache doesn't exceed the requested size.
Do that later on when we set up the hwtnl state instead.
This addresses a problem when we drop the uploaded copy due to a vb
size change, it will remain referenced in svga->curr.vb[], and the
new contents of the vb will never be uploaded.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The encoding/decoding algorithms are shared with RGTC.
Thanks to some magic with the base format, the RGTC texstore functions work
for LATC too.
swrast passes the related piglit tests besides two things:
- The alpha channel is wrong (it's always 1), however the incorrect alpha
channel makes some other tests fail too, so I guess it's unrelated to LATC.
- Signed LATC fetches aren't correct yet (signed values are clamped to [0,1]),
however RGTC has the same problem.
Further testing (with other of my patches) shows that hardware drivers
and softpipe work.
BTW, ETQW uses this extension.
st->user_vb[attr] was always pointing to the same user vb, regardless
of the value of attr. Together with reverting the temporary workaround
for bug 34378, and a fix in the svga driver, this fixes googleearth on svga.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The signature list in a function must contain only ir_function_signature nodes.
The target of an ir_call must be an ir_function_signature.
These were added while trying to debug Mesa bugzilla #34203.
Instead of using the current gl_fragment_program. These aren't necessarily
the same, for example when translate_program() is called by
i915ValidateFragmentProgram().
this doesn't bind to drivers yet, just enough to in theory make indirect
work against other servers.
I'm really not sure what the rules for adding extensions to the known_gl_extensions list as it looks to be missing a few. are these GL extensions that have GLX
protocol??
Signed-off-by: Dave Airlie <airlied@redhat.com>
Comments about the deleted stuff:
- openaren hang: likely caused by the vertex corruptions, fixed by Jakob.
- tiling: Y-tiling works with my hw-clear branch. X-tiling works as
merged to master a while ago (execbuf2 version).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ARB_instanced_arrays is a subset of D3D9.
ARB_draw_instanced is a subset of D3D10.
The point of this change is to allow D3D9-level drivers to enable
ARB_instanced_arrays without ARB_draw_instanced.
If an array redeclaration includes an initializer, the initializer
would previously be dropped on the floor. Instead, directly apply the
initializer to the correct ir_variable instance and append the
generated instructions.
Fixes bugzilla #34374 and piglit tests glsl-{vs,fs}-array-redeclaration.
NOTE: This is a candidate for stable release branches. 0292ffb8 and
8e6cb9fe are also necessary.
Removed sampler view support for USCALED/SSCALED, the texture unit
refuses to convert to non-normalized float. The enums are treated
like UNORM.
Removed duplicate format related headers.
The hw doesn't like it - demos/shadowtex is broken. The emitted shader
isn't totally empty though, the depth write fixup gets emitted instead.
Maybe that one is somewhat fishy, too?
Idea for this patch from Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
https://bugs.freedesktop.org/show_bug.cgi?id=29172
Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper)
Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
Because the format can be changed to UNORM in a surface.
This fixes:
state_tracker/st_atom_framebuffer.c:163:update_framebuffer_state:
Assertion `framebuffer->cbufs[i]->texture->bind & (1 << 1)' failed.
Computation of the delta of this array from the last had a silly little
bug and ignored any initial delta==0 causing grief in Nexuiz and
friends.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is a silicon bug which causes unpredictable behaviour if the
URB_FENCE command should cross a cache-line boundary. Pad before the
command to avoid such occurrences. As this command only applies to
gen4/5, do the fixup unconditionally as the specs do not actually state
for which chip it was fixed (and the cost is negligible)...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, the rule deleted by this commit was matched every single
time (being the longest match). If not skipping, it used REJECT to
continue on to the actual correct rule.
The flex manual advises against using REJECT where possible, as it is
one of the most expensive lexer features. So using it on every match
seems undesirable. Perhaps more importantly, it made it necessary for
the #if directive rules to contain a look-ahead pattern to make them
as long as the (now deleted) "skip the whole line" rule.
This patch introduces an exclusive start state, SKIP, to avoid REJECTs.
Each time the lexer is called, the code at the top of the rules section
will run, implicitly switching the state to the correct one.
Fixes piglit tests 16384-consecutive-chars.frag and
16385-consecutive-chars.frag.
The expected result has been out of sync with what glcpp produces for
some time; glcpp's actual result seems to be correct and is very close to
GCC's cpp. Updating this will make it easier to catch regressions in
upcoming commits.
PIPE_BIND_CONSTANT_BUFFER alone was OK for nv50/nvc0, but nv30 will need
to be able to set others on certain chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit is basically a copy-over of the fix
Chia-I Wu's commited to wayland:
http://cgit.freedesktop.org/wayland/wayland-demos/commit/?id=1b6c0ed95
"Workaround an xcb-dri2 bug.
xcb_dri2_connect_device_name generated by xcb-proto 1.6 is broken.
It only works when the length of the driver name is a multiple of 4."
This reverts commit 1f9a0a4e6e.
This caused trouble with Lightsmark w/ i965 driver and fbo/fbo-blit-d24s8
(see bug 34894). It's probably something simple but no time to debug now.
SNB has 64k urb space, we only use piece of them.
The more urb space we alloc,
the more concurrent vs threads we can run.
push the urb space usage to the limit.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
this fixes fbo-generatemipmap-formats rgtc and s3tc in NPOT mode
with softpipe.
r600g fails to even get level 0 correct so have to look into that
a bit further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was always converting to 8-bit per channel unsigned formats,
which isn't suitable for RGTC signed formats, this special cases
those two formats and converts to floats for those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SNORM needs a bit of work in the state tracker in order for mipmap
generation to work I believe.
I'm also not sure that having unorm fetches for an snorm format is
sane.
With signed types we weren't hitting this test however the comment
stating this doesn't happen often doesn't apply when using signed
types since an all 0 block is quite common which isn't abs min or max.
this fixes the limits correctly again also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
if the values are all in the last dword, the high bits can be 0,
This fixes a valgrind warning I saw when playing with mipmaps.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drivers can call this function as needed. It tells the VBO module to
always unmap the current glBegin/glEnd VBO when we flush. Otherwise
it's possible to be in a flushed state but still have the VBO mapped.
Among other benefits, parallel makes now work. Since many people have
parallel builds by default (via MAKEFLAGS environment variable), this
sames some irritation at release time...when there's usually not any
other irritation already.
In my last commit I introduced a build dependency upon a new libdrm.
Add the associated autoconf checks. As the headers are part of the core
libdrm, we need to bump that version and so may as well bump the chipset
specific versions simultaneously.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With relaxed relocation checking in the kernel, we can specify a
negative delta (i.e. pointing outside of the target bo) in order to fake
a range in a large buffer. We only then need to upload the elements used
and adjust the buffer offset such that they correspond with the indices
used in the DrawArrays.
(Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and take advantage of start_vertex_bias to trim to [min_index,
max_index] where possible (i.e. when we need to upload all arrays).
Fixes half_float_vertex(misc.fillmode.wireframe)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When doing copy swapbuffers using drm, throttle on outstanding copy operations.
Introduces a new environment variable, EGL_THROTTLE_FENCES that the
user can use to indicate the desired number of outstanding swapbuffers, or
disable throttling using EGL_THROTTLE_FENCES=0.
This can and perhaps should be extended to the pageflip case as well, since
with some hardware pageflips can be pipelined. In case the pageflip syncs, the
throttle operation will be a no-op anyway.
Update copyright notices.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Use the pageflip ioctl when available.
Otherwise, or when the backbuffer contents need to be preserved,
fall back to a copy operation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The copy swap can be used when we need to preserve the contents of
the back buffer or when there is no way to do native page-flipping.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This makes it usable also for native helpers.
Also add inline functions to access the context and to
uninit the native display structure.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is reliant on a drm patch that I posted on the list + a version bump.
These will appear in drm-next today.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the drm minor version is > 9 (i.e. whats in drm-next),
we enable s3tc + texture tiling by default now.
this changes R600_FORCE_TILING to R600_TILING which can
be set to false to disable tiling on working drm.
Signed-off-by: Dave Airlie <airlied@redhat.com>
nv50_resource is being called nv04_resource now temporarily, to avoid
a naming conflict with nouveau_resource from libdrm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Modified from original to remove chipset-specific code, and to be decoupled
from the mm present in said drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
By default the hardware rounds texcoords. However,
for point sampled textures, the expected behavior is
to truncate. When we have point sampled textures,
set the truncate bit in the sampler.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=25871
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
No idea why I didn't do it like this the first time, but share
the code like other portions of mesa do using _tmp.h suffix
and some #defines for the types.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The other draw stages like aaline and pstipple were already doing this.
If the driver used the aapoint stage but not the others it would crash
because of a null pipe->draw pointer.
It was only implemented in the swrast driver and probably not used by
any applications. A modern app would use a dependent/chained texture
lookup in the fragment shader.
when doing glCopyTex[Sub]Image() and checking the source buffer's
completeness.
We only need to determine FBO completeness when the status is indeterminate.
I removed the HiZ memory management, because the HiZ RAM is too small
and I also did it in hope that HiZ will be enabled more often.
This also sets aligned strides to HIZ_PITCH and ZMASK_PITCH.
This adds support for the RGTC unsigned and signed
texture storage and fetch methods.
the code is a port of the DXT5 alpha compression code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With an extremely dumb strategy. But it's the same i915c employs.
Also improve the hw_atom code slightly by statically specifying the
required batch space. For extremely variably stuff (shaders, constants)
it would probably be better to add a new parameter to the hw_atom->validate
function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also contains the first few bits for hw state atoms.
v2: Implement suggestion by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Add the batch bo to the libdrm validation lost, for otherwise
libdrm won't take previously used buffers into account.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move it to i915_state_static.c This way i915_emit_state.c only emits
state and doesn't (re)calculate it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This came out of discussion at the office today, and we agreed that
solving this for indirect wasn't really interesting, though the
server-side change would be of a similar level of difficulty.
If another thread bound a context to the drawable then unbound it, the
driContextPriv would end up NULL.
With the previous two fixes, this fixes glx-multithread-makecurrent-2,
despite the issue not being about the multithreaded makecurrent.
The driver only has one reasonable place to look for its context to
flush anything, which is the current context. Don't bother it with
having to check.
This extension allows a client to bind one context in multiple threads
simultaneously. It is then up to the client to manage synchronization of
access to the GL, just as normal multithreaded GL from multiple contexts
requires synchronization management to shared objects.
The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're
doing a non-bias texture lookup. It has the same value as the new constant
BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no
functional changes.
There is an issue with gcc 4.6.0 that leads to segfault/assert with mesa
due to ureg_src size, reshuffling the structure member to better better
alignment work around the issue.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47893
7.9 + 7.10 candidate
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
so far only hw mipmap generation is testing on softpipe,
passes test added to piglit.
this requires another patch to mesa to let array textures mipmaps
even start to happen.
platform.system in SCons on Cygwin includes the OS version number.
Windows XP - CYGWIN_NT-5.1
Windows Vista - CYGWIN_NT-6.0
Windows 7 - CYGWIN_NT-6.1
Reduce all Cygwin platform variants to just 'cygwin' so anything
downstream can simply use 'cygwin' instead of the different full
platform names.
255.875 matches the hardware documentation. Presumably this was a typo.
NOTE: This is a candidate for the 7.10 branch, along with
commit 2bfc23fb86.
Reviewed-by: Eric Anholt <eric@anholt.net>
In the case where glBlitFramebuffer is being used to copy to a texture
without scaling it is faster if we can use the hardware to do a blit
rather than having to do a texture render. In most of the drivers
glCopyTexSubImage2D will use a blit so this patch makes it check for
when glBlitFramebuffer is doing a simple copy and then divert to
glCopyTexSubImage2D.
This was originally proposed as an extension to the common meta-ops.
However, it was rejected as using the BLT is only advantageous for Intel
hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Might be necessary if a block sneaks in somewhere, like a common
block for moves of phi sources after a loop break.
This is harmless and normally will be removed before emission.
In linear scan we can't allocate multiple values with different
live ranges at the same time to assign them consecutive regs.
Maybe we should just switch to graph coloring for all values ...
The svga_update_state() mechanism is inadequate as it will always end up
flushing the primitives before processing the SVGA_NEW_COMMAND_BUFFER
dirty state flag.
Fixes piglit/fbo-depth-sample-compare:
==14722== Invalid free() / delete / delete[]
==14722== at 0x4C240FD: free (vg_replace_malloc.c:366)
==14722== by 0x84FBBFD: intel_upload_unmap (intel_buffer_objects.c:695)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== Address 0xc606310 is 0 bytes after a block of size 18,720 alloc'd
==14722== at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==14722== by 0x85202AB: copy_array_to_vbo_array (brw_draw_upload.c:256)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34604
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This adds EXT_texture_array support to r600g, it passes the piglit
array-texture test but I suspect may not be complete.
It currently requires a kernel patch to fix the CS checker to allow
these, so you need to use R600_ARRAY_TEXTURE=true for now
to enable them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is because the HW doesn't always store a 1D array like a
2D texture, it more likely stores it like 2D texture (i.e.
alignments etc).
This means we upload each slice separately and let the driver
work out where to put it.
this might break nvc0 as I can't test it, I have only nv50 here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
... and prefers a small batch whereas gen4+ prefer a large batch to
carry more state.
Tuning using openarena/padman indicate that a batch size of just 4096 is
best for those cases.
Bugzilla: https://bugs.freedesktop.org/process_bug.cgi
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that the offsets in the vertex-fetch instructions need to be byte-aligned and makes no specification with regard to the required alignment of the offset and stride in the vertex resource constant register.
However, testing indicates that all three values need to be DWORD aligned.
Ptr can be very well NULL, so when there are two arrays, with one having
offset 0 (and thus NULL Ptr), and the other having a non-zero offset,
the non-zero value is taken as minimum (because of !low_addr ? start ...).
On 32-bit systems, this somehow works. On 64-bit systems, it leads to crashes.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
255.875 matches the hardware documentation. Presumably this was a typo.
Found by inspection. Not known to fix any issues.
Reviewed-by: Eric Anholt <eric@anholt.net>
pixel_w is the final result; wpos_w is used on gen4 to compute it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.
Fixes several OpenGL ES2 conformance failures on Sandybridge.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_PointSize (VERT_RESULT_PSIZ) doesn't take up a message register,
as it's part of the header. Without this fix, writing to gl_PointSize
would cause the SF to read and use the wrong attributes, leading to all
kinds of random looking failure.
Reviewed-by: Eric Anholt <eric@anholt.net>
If two buffers had the same stride where one buffer is a user one and
the other is a vbo, it was considered to be one interleaved buffer,
resulting in incorrect rendering and crashes.
This patch makes sure that the interleaved buffer is either user or vbo,
not both.
Needs to track this ourself since because we get into a race condition with
the dri_util.c code on make current when rendering to the front buffer.
This is what happens:
Old context is rendering to the front buffer.
App calls MakeCurrent with a new context. dri_util.c sets
drawable->driContextPriv to the new context and then calls the driver make
current. st/dri make current flushes the old context, which calls back into
st/dri via the flush frontbuffer hook. st/dri calls dri loader flush
frontbuffer, which calls invalidate buffer on the drawable into st/dri.
This is where things gets wrong. st/dri grabs the context from the dri
drawable (which now points to the new context) and calls invalidate
framebuffer to the new context which has not yet set the new drawable as its
framebuffers since we have not called make current yet, it asserts.
When clearing a GL_LUMINANCE_ALPHA buffer, for example, we need to convert
the clear color (R,G,B,A) to (R,R,R,A). We were doing this for texture border
colors but not renderbuffers. Move the translation function to st_format.c
and share it.
This fixes the piglit fbo-clear-formats test.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If finalizing a non-POW mipmapped texture with an odd-sized base texture
image we were allocating the wrong size of gallium texture (off by one).
Need to be more careful about computing the base texture image size.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=34463
Reducing the number of relocations has lots of nice knock-on effects,
not least including reducing batch buffer size, auxilliary array sizes
(vmalloced and copied into the kernel), processing of uncached
relocations etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using a temporary buffer for large discontiguous uploads into the common
buffer and a single buffered upload is faster than performing the
discontiguous copies through a mapping into the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Upload the non-vbo arrays into a single interleaved buffer object, and
so need to just emit a single vertex buffer relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the user passed in several arrays interleaved in the same vbo, only
emit a single vertex buffer and relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for a greater change, use the color_calc_state_bo already
provisioned for this purpose.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing a blit to completely overwrite a busy bo, simply
discard it and create a new one with the fresh data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic draw buffers are used by clients for temporary arrays and for
uploading normal vertex arrays. By keeping the data in memory, we can
avoid reusing active buffer objects and reallocate them as they are
changed. This is important for Sandybridge which can not issue blits
within a batch and so ends up flushing the batch upon every update, that
is each batch only contains a single draw operation (if using dynamic
arrays or regular arrays from system memory).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fix a typo which meant that --enable-shared-glapi didn't actually cause a shared glapi to be built
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This behavior was last when moving the transfers to the contexts.
This fixes several piglit failures, which were reading the color renderbuffer
before the draw operations were emitted.
This is a temporary workaround. It fixes sauerbrauten with shaders enabled.
I guess we might be changing vertex attribs somewhere and not updating
the appropriate dirty flags, therefore we can't rely on them for now.
Or maybe we need to make this state dependent on some other flags too.
More info:
https://bugs.freedesktop.org/show_bug.cgi?id=34378
When we drop the in_swtnl_draw flag, we must force a rerun of
update_need_swtnl to reset the need_swtnl flag to its correct value outside
of a swtnl vbo draw.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
If we hit the pipe_get/put_tile() path for setting up the glCopyPixels
texture we were passing the wrong x/y position to pipe_get_tile().
The x/y position was already accounted for in the pipe_get_transfer()
call so we were effectively reading from 2*readX, 2*readY.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Addresses excessive TEMP allocation in vertex shaders where all CONSTs are
stored into TEMPS at the start, but copy propagation was failing due to
the presence of IFs.
We could do something about loops, but ifs are easy enough.
This enables the egl_dri2 driver to load swrast driver
for software rendering. It could be used when hardware
dri2 drivers are not available, such as in VM.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Some cards don't appear to work correctly with the UNORM type,
so switch to the integer type, however since gallium has no
integer types yet from what I can see we need to do a hack to
workaround it for the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a workaround for a bug in libtxc_dxtn.
Fixes:
- piglit/GL_EXT_texture_compression_s3tc/fbo-generatemipmap-formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
Arrays are zero based. If the highest element accessed is 6, the
array needs to have 7 elements.
Fixes piglit test glsl-fs-implicit-array-size-03 and bugzilla #34198.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Fixes regression: https://bugs.freedesktop.org/show_bug.cgi?id=34160
Commit e7c1f058d1 disabled constant-folding
when division-by-zero occured. This was a mistake, because the spec does
allow division by zero. (From section 5.9 of the GLSL 1.20 spec: Dividing
by zero does not cause an exception but does result in an unspecified
value.)
For floating-point division, the original pre-e7c1f05 behavior is
reinstated.
For integer division, constant-fold 1/0 to 0.
This reverts commit b3cf92aa91.
The reverted commit prevented constant-folding of reciprocal expressions
when the reciprocated expression was 0. However, since the spec allows
division by zero, constant-folding *is* permissible in this case.
From Section 5.9 of the GLSL 1.20 spec:
Dividing by zero does not cause an exception but does result in an
unspecified value.
Before populating the vertex buffer attribute pointer (VB->AttribPtr[]),
convert vertex data in GL_FIXED format to GL_FLOAT.
Fixes bug: http://bugs.freedesktop.org/show_bug.cgi?id=34047
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
On r600, s3tc formats require a 1D tiled texture format,
so we have to do uploads using a blit, via the 64-bit and 128-bit formats
Based on the r600c code we use a 64 and 128-bit type to do the
blits.
Still requires R600_ENABLE_S3TC until the kernel fixes are in,
this has only been tested on evergreen where the kernel doesn't
yet get in the way.
the miptree setup and pitch storing didn't work so well for block
based things like compressed textures. The CB takes blocks, where
the texture sampler takes pixels, and transfers need bytes,
So now we store blocks/bytes and translate to pixels in the sampler.
This is necessary for s3tc to work properly.
If the underlying transfer had a stride wider for hw alignment reasons,
the mipmap generation would generate badly strided images.
this fixes a few problems I found while testing r600g with s3tc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
If EXT_draw_buffers2 is supported but ARB_draw_buffers_blend isn't
_mesa_BlendFuncSeparateEXT only sets up the blend equation and function for the
first render target. This patch makes sure that update_blend doesn't try to use
the data from other rendertargets in such cases.
Signed-off-by: Brian Paul <brianp@vmware.com>
The vertex arrays state should be set only when (_NEW_ARRAY | _NEW_PROGRAM)
is dirty. This assumes user buffer content is mutable, which will be
sorted out in the next commit. The following usage case should be much faster
now:
for (i = 0; i < 1000; i++) {
glDrawElements(...);
}
Or even:
for (i = 0; i < 1000; i++) {
glSomeStateChangeOtherThanArraysOrProgram(...);
glDrawElements(...);
}
The performance increase from this may be significant in some apps and
negligible in others. It is especially noticable in the Torcs game (r300g):
Before: 15.4 fps
After: 20 fps
Also less looping over attribs in st_draw_vbo yields slight speed-up
in apps with lots of glDraw* calls.
but really wtf? all these PCI IDs need to be ripped out of here, its totally
unscalable and the drivers already have this info so could export it some better way.
tested by Darxus on #wayland.
This an adds --enable-shared-dricore option to configure. When enabled,
DRI modules will link against a shared copy of the common mesa routines
rather than statically linking these.
This saves about 30MB on disc with a full complement of classic DRI
drivers.
v2: Only enable with a gcc-compatible compiler that handles rpath
Handle DRI_CFLAGS without filter-out magic
Build shared libraries with the full mklib voodoo
Fix typos
v3: Resolve conflicts with talloc removal patches
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Track variables, functions, and types during parsing. Use this
information in the lexer to return the currect "type" for identifiers.
Change the handling of structure constructors. They will now show up
in the AST as constructors (instead of plain function calls).
Fixes piglit tests constructor-18.vert, constructor-19.vert, and
constructor-20.vert. Also fixes bugzilla #29926.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This requires lexical disambiguation between variable and type
identifiers (as most C compilers do).
Signed-off-by: Keith Packard <keithp@keithp.com>
NOTE: This is a candidate for the 7.9 and 7.10 branches.
add support for the 32-bit types, also fixup the
export setting to handle types with channels > 11 bits properly
Signed-off-by: Dave Airlie <airlied@redhat.com>
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
Previously the SNE and SEQ instructions would calculate the partial
result to the destination register. This would cause problems if the
destination register was also one of the source registers.
Fixes piglit tests glsl-fs-any, glsl-fs-struct-equal,
glsl-fs-struct-notequal, glsl-fs-vec4-operator-equal,
glsl-fs-vec4-operator-notequal.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If the formats don't match we need to update the surface with the new
format.
if we can render to SRGB formats, enable the extension
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a regression from commit 5cbff0932e.
The problem is *some* glDrawPixels fragment programs need to be deleted,
but not all. Use an explicit flag to indicate whether or not the program
needs to be deleted.
This should fix http://bugs.freedesktop.org/show_bug.cgi?id=34049
From section 5.9 of the GLSL 1.20 spec:
The operator modulus (%) is reserved for future use.
From section 5.8 of the GLSL 1.20 spec:
The assignments modulus into (%=), left shift by (<<=), right shift by
(>>=), inclusive or into ( |=), and exclusive or into ( ^=). These
operators are reserved for future use.
The GLSL ES 1.00 spec and GLSL 1.10 spec have similiar language.
Fixes bug:
https://bugs.freedesktop.org//show_bug.cgi?id=33916
Fixes Piglit tests:
spec/glsl-1.00/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.00/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.10/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.10/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.20/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.20/compiler/assignment-operators/modulus-assign-00.frag
Fixes a minor memory leak with the "engine" mesa demo.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure we unreference the vertex buffer pointers in a local array.
This fixes huge vertex buffer / memory leaks in mesa demos "fire" and "engine".
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Relative addressing of constant buffers can't work properly through the
kcache, since you can only address within the currently locked kcache window.
Instead, this patch binds the constant buffer as a shader resource, and then
explicitly fetches the constant using a vertex fetch with fetch type
VTX_FETCH_NO_INDEX_OFFSET from the shader. There's probably still some room
for improvement, doing the fetch right before the instruction that needs the
value may not be quite optimal for example.
The r600_bc_alu_src structure is used in two different ways, as a vector and
for the individual channels of that same vector. This is somewhat fragile,
and probably confusing.
If the client hasn't done the initial wl_display_iterate() at the time
we initialize the display, we have to do that in platform_wayland.c.
Make sure we detect that correctly instead of dup()ing fd=0, and use
the sync callback to make sure we don't wait forever for authorization that
won't happen.
This code has originally matured in r300g and was ported to r600g several
times. It was obvious it's a code duplication.
See also comments in the header file.
The scheduler and the register allocator are not good enough yet to deal
with the effects of the register rename pass. This was causing a 50%
performance drop in Lightsmark. The pass can be re-enabled once the
scheduler and the register allocator are more mature. r300 and r400
still need this pass, because it prevents a lot of shaders from using
too many texture indirections.
NOTE: This is a candidate for the 7.10 branch.
This adds i965 support for GL_EXT_framebuffer_sRGB, it introduces a new
constant to say that the driver can support sRGB enabled FBOs since enabling
the extension doesn't mean the driver can actually support sRGB.
Also adds the suggested state flush in the core code suggested by Brian.
fix the ARB_fbo color encoding.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Querying index zero is not an error in OpenGL ES 2.0.
Querying an index larger than the value returned by
GL_MAX_VERTEX_ATTRIBS is an error in all APIs.
Fixes bugzilla #32375.
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes. It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway. However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
The type of u_get_transfer_vtbl of the usage argument in u_transfer.h is
unsigned and not enum pipe_transfer_usage. This patch changes the type
of usage to unsigned to match the prototype in the header file.
Standard library functions in C++ are in the std namespace. When using
C++-style header files for the standard library, some compilers, such as
Sun Studio, provide symbols only for the std namespace and not for the
global namespace.
This patch adds using statements for standard library functions. Another
option could have been to prepend standard library function calls with
'std::'.
This patch fixes several compilation errors with Sun Studio.
This just adds a flag to create the texture without doing any
flushing to it. Flushing occurs in the draw function. This avoids
unnecessary flushes when we end up rebinding a CB/DB/texture due
to the blitter just restoring state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, the set_sampler_views() and restore_sampler_views() functions
used MAX2(old,new) to tell the driver how many samplers or sampler
views to set. This could result in cases such as:
pipe->set_fragment_sampler_views(pipe, 4, views={foo, bar, NULL, NULL})
Many/most gallium drivers would take this as-is and set
ctx->num_sampler_views=4 and ctx->sampler_views={foo, bar, NULL, NULL, ...}.
Later, loops over ctx->num_sampler_views would have to check for null
pointers. Worse, the number of sampler views and number of sampler CSOs
could get out of sync:
ctx->num_samplers = 2
ctx->samplers = {foo, bar, ...}
ctx->num_sampler_views = 4
ctx->sampler_views={Foo, Bar, NULL, NULL, ...}
So loops over the num_samplers could run into null sampler_views pointers
or vice versa.
This fixes a failed assertion in the SVGA driver when running the Mesa
engine demo in AA line mode (and possibly other cases).
It looks like all gallium drivers are careful to unreference views
and null-out sampler CSO pointers for the units beyond what's set
with the pipe::bind_x_sampler_states() and pipe::set_x_sampler_views()
functions.
I'll update the gallium docs to explain this as well.
This new interface could set up context for OpenGL,
OpenGL ES1 and OpenGL ES2. It will be used by egl_dri2
driver.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Leak was introduced when fixing strict aliasing violation in this code:
the reference counting was preserved, but the destructor call on zero
reference count was not.
Exactly one half would be the ideal, but this is a soft limit, and one
more byte over brings us to synchronous behavior.
Flushing when the referred GMR exceeds one third of the aperture gives us
statistically better performance.
Certain mingw32 cross compilers (e.g. RedHat's) defaults to use DLL gcc
runtime.
Given the main deliverable from this project are self-contained drivers,
which are loaded by any application, this dependency can cause havoc.
If we get a sw accessible buffer like the S8 texture we end up
doing depth tracking on it when there is no need since we won't
ever bind it to the hardware. This leads to a sw fallback in the
transfer destruction which leads to and endless recusion loop
of fail in transfer destroy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a flag to keep track of whether the depth texture structure
is the flushed texture or not, so we can avoid doing flushes when
we do a hw rendering from one to the other.
it also renames flushed to dirty_db which tracks if the DB copy
has been dirtied by being bound to the hw.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This consolidates the code duplicated between the fragment sampler
and vertex sampler functions. Plus, it'll make adding support for
geometry shader samplers trivial.
Before we were looping to nr_samplers, which is the number of fragment
samplers, not vertex samplers.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
For example, this now raises an error:
#define XXX 1 / 0
Fixes bug: https://bugs.freedesktop.org//show_bug.cgi?id=33507
Fixes Piglit test: spec/glsl-1.10/preprocessor/modulus-by-zero.vert
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Avoid division-by-zero when constant-folding the following expression
types:
ir_unop_rsq
ir_binop_div
ir_binop_mod
Fixes bugs:
https://bugs.freedesktop.org//show_bug.cgi?id=33306https://bugs.freedesktop.org//show_bug.cgi?id=33508
Fixes Piglit tests:
glslparsertest/glsl2/div-by-zero-01.frag
glslparsertest/glsl2/div-by-zero-02.frag
glslparsertest/glsl2/div-by-zero-03.frag
glslparsertest/glsl2/modulus-zero-01.frag
glslparsertest/glsl2/modulus-zero-02.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Do not constant-fold a reciprocal if any component of the reciprocated
expression is 0. For example, do not constant-fold `1 / vec4(0, 1, 2, 3)`.
Incorrect, previous behavior
----------------------------
Reciprocals were constant-folded even when some component of the
reciprocated expression was 0. The incorrectly applied arithmetic was:
1 / 0 := 0
For example,
1 / vec4(0, 1, 2, 3) = vec4(0, 1, 1/2, 1/3)
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This was my mistake when converting from talloc to ralloc. I was
confused because the other calls in the function are to asprintf_append
and the original code used str as the context rather than NULL.
Fixes bug #33823.
Previously a register would be marked as available if any component
was written. This caused shaders such as this:
0: TEX TEMP[0].xyz, INPUT[14].xyyy, texture[0], 2D;
1: MUL TEMP[1], UNIFORM[0], TEMP[0].xxxx;
2: MAD TEMP[2], UNIFORM[1], TEMP[0].yyyy, TEMP[1];
3: MAD TEMP[1], UNIFORM[2], TEMP[0].zzzz, TEMP[2];
4: ADD TEMP[0].xyz, TEMP[1].xyzx, UNIFORM[3].xyzx;
5: TEX TEMP[1].w, INPUT[14].xyyy, texture[0], 2D;
6: MOV TEMP[0].w, TEMP[1].wwww;
7: MOV OUTPUT[2], TEMP[0];
8: END
to produce incorrect code such as this:
BEGIN
DCL S[0]
DCL T_TEX0
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[0].xyz = MOV U[0]
R[1] = MUL CONST[0], R[0].xxxx
R[2] = MAD CONST[1], R[0].yyyy, R[1]
R[1] = MAD CONST[2], R[0].zzzz, R[2]
R[0].xyz = ADD R[1].xyzx, CONST[3].xyzx
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[1].w = MOV U[0]
R[0].w = MOV R[1].wwww
oC = MOV R[0]
END
Note that T_TEX0 is copied to R[0], but the xyz components of R[0] are
still expected to hold a calculated value.
Fixes piglit tests draw-elements-vs-inputs, fp-kill, and
glsl-fs-color-matrix. It also fixes Meego bugzilla #13005.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Also return it as the correct type. Previously the whole array would
be returned and each element would be expanded to a vec4.
Fixes piglit test getuniform-01 and bugzilla #29823.
Passing ralloc_vasprintf_append a 0-byte allocation doesn't work. If
passed a non-NULL argument, ralloc calls strlen to find the end of the
string. Since there's no terminating '\0', it runs off the end.
Fixes a crash introduced in 14880a510a.
If we see a MACRO bit on r600g its 2D tiled,
if don't see a MACRO bit and we do see a MICRO bit then its 1D tiled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the tile type for the buffer is 1 then its been bound to the
DB at some point, we need to decompress it, otherwise its only
been bound as texture/cb so don't do anything.
This fixes 5 piglit tests here on r600g.
this just adds the ioctl interface and sets the tile type
and array mode in the correct place.
This seems to bring eg 1D tiling to the same level, and issues
as on r600. No idea how to address 2D yet.
Previously we'd happily compile GLSL 1.30 shaders on any driver. We'd
also happily compile GLSL 1.10 and 1.20 shaders in an ES2 context.
This has been a long standing FINISHME in the compiler.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
This annotation is for an "in" function parameter for which it is only legal
to pass constant expressions. The only known example of this, currently,
is the textureOffset functions.
This should never be used for globals.
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
Since the introduction of ir_var_system_value, system variables would be
printed as "temporary" and temporaries would result in out-of-bounds
array access, showing up as garbage in printed IR.
SVGA3D only supports SGN for vertex shaders, and this requires two additional
temporary registers for intermediate results.
For fragment shaders, lower to two CMPs and one ADD.
The extra length is the size of the request *minus* the size of the
VendorPrivate header, not the addition.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
xGLXChangeDrawableAttributesSGIXReq follows the GLXVendorPrivate header
with a drawable, number of attributes, and list of (type, value)
attribute pairs. Don't forget to put the number of attributes in there.
I don't think this can ever have worked.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Like on some r5xx, there are multiple DB backends on the r600,
we need to add up the query results from each of these to get the
final correct value.
So far I'm not 100% sure how to calculate the num_db, value
setting it to 4 should be harmless enough until we do.
This fixes occulsion_query piglit test on my rv740.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although CUBE is a reduction inst, it writes to more than just PV.X
so we need to keep the dst channel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This only works on r600/r700 so far, evergreen doesn't appear
to have the multiwrite enable bit in the color control, so we
may have to actually do a shader rewrite on EG hardware.
remove some duplicate code reg defines also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I know Jerome will probably rewrite the way depth textures work sometime
soon. For the time being this should at least make common depth texture usage
for shadowing work properly though.
When the paint is opaque (currently, solid color with alpha 1.0f), no
blending is needed for VG_BLEND_SRC_OVER. This eliminates the serious
performance hit introduced by 859106f196
for a common scenario.
Swizzles are now defined everywhere as a field with 12 bits that contains
4 channels worth of meaningful information. Any channel that is unused is
set to RC_SWIZZLE_UNUSED. This change is necessary because rgb instructions
and alpha instructions were initializing channels that would never be used
(channel 3 for rgb and channels 1-3 for alpha) with 0 (aka RC_SWIZZLE_X).
This made it impossible to use generic helper functions for swizzles,
because sometimes a channel value of 0 meant unused and other times it
meant RC_SWIZZLE_X.
All hacks that tried to guess how many channels were relevant have
also been removed.
1) Only translate the [min_index, max_index] range.
2) Upload translated vertices via the uploader.
3) Rename valid_vertex_buffer[] to real_vertex_buffer[]
Only upload the [min_index, max_index] range instead of [0, userbuf_size].
This an important optimization.
Framerate in Lightsmark:
Before: 22 fps
After: 75 fps
The same optimization is already in r300g.
I can't see a performance difference with this code, which means all
the driver-specific code removed in this commit was unnecessary.
Now we use u_upload_mgr in a slightly different way than we did before it got
dropped. I am not restoring the original code "as is" due to latest
u_upload_mgr changes that r300g performance benefits from.
This also fixes:
- piglit/fp-kil
When an app loads libEGL.so dynamically with RTLD_LOCAL, loading DRI
drivers would fail because of missing glapi symbols. This commit makes
egl_dri2 load libglapi.so with RTLD_GLOBAL to export glapi symbols for
future symbol resolutions.
The same trick can be found in GLX. However, egl_dri2 can only do so
when --enable-shared-glapi is given. Because, otherwise, both libGL.so
and libglapi.so define glapi symbols and egl_dri2 cannot tell which
library to load.
When the user sets EGL_DRIVER to egl_dri2 (or egl_glx), make sure the
built-in driver is used. The user might leave the outdated egl_dri2.so
(or egl_glx.so) on the filesystem and we do not want to load it.
makedepend would crash when a source includes a header indirectly, such
as
#define HEADER "some-header.h"
#include HEADER
Do not define HEADER (makedepend would detects this as an incomplete
include) and add the dependency manually in the Makefile.
This should hopefully fix bug #33374.
For 1D/2D texture arrays use the pipe_resource::array_size field.
In OpenGL 1D arrays texture use the height dimension as the array
size and 2D array textures use the depth dimension as the array size.
Gallium uses a special array_size field instead. When setting up
gallium textures or comparing Mesa textures to gallium textures we
need to be extra careful that we're comparing the right fields.
The new st_gl_texture_dims_to_pipe_dims() function maps OpenGL
texture dimensions to gallium texture dimensions and simplifies
this quite a bit.
This reverts commit d3df641f0a.
The original commit had sat unpushed on my machine for months. By the
time I found it again, I had forgotten that we had decided not to use
this change after all, (the relevant test was removed long ago).
The GLSL specification is vague here, (just says "as is standard for
C++"), though the C specifications seem quite clear that this should
be an error.
However, an existing piglit test (CorrectPreprocess11.frag) expects
this to be a warning, not an error, so we change this, and document in
README the deviation from the specification.
So that 'foo' can be found in: OPTION=prefixfoosuffix,foo
Also allow that debug options can be separated by a non-alphanumeric characters
instead of just commas.
This drops the memblock manager for ZMASK. Instead, only one zbuffer can be
compressed at a time. Note that this does not necessarily have to be slower.
When there is a large number of zbuffers, compression might be used more often
than it was before. It's also easier to debug.
How it works:
1) 'clear' turns the compression on.
2) If some other zbuffer is set or the currently-bound zbuffer is used
for texturing, the driver decompresses it and then turns the compression off.
Notes:
- The ZMASK clear has been refactored, so that only one packet3 is used to clear
ZMASK.
- The 8x8 compression mode is disabled. I couldn't make it work without issues.
- Also removed driver-specific stuff from u_blitter.
Driver status:
- RV530 and R580 appear to just work (finally).
- RV570 should work, but there may be an issue that we don't correctly
calculate the number of dwords to clear, resulting in a partially
uninitialized zbuffer.
- RS690 misrenders as if no ZMASK clear happened. No idea what's going on.
- RV350 may even hardlock. This issue was already present and this patch doesn't
fix it.
I think we are still missing some hardware info we need to make the zbuffer
compression work fully.
Note that there is also an issue with HiZ, resulting in a sort of blocky
zigzagged corruption around some objects.
From the AMD_conservative_depth spec:
If gl_FragDepth is redeclared in any fragment shader in a program, it
must be redeclared in all fragment shaders in that program that have
static assignments to gl_FragDepth. All redeclarations of gl_FragDepth in
all fragment shaders in a single program must have the same set of
qualifiers.
When AMD_conservative_depth is enabled:
* Let 'layout' be a token.
* Extend the production rule of layout_qualifier_id to process the tokens:
depth_any
depth_greater
depth_less
depth_unchanged
Let's assume there are two options with names such that one is a substring
of another. Previously, if we only specified the longer one as a debug option,
the shorter one would be considered specified as well (because of strstr).
This commit fixes it by checking that each option is surrounded by commas.
(a regexp would be nicer, but this is not a performance critical code)
Update the max_array_access of a global as functions that use that
global are pulled into the linked shader.
Fixes piglit test glsl-fs-implicit-array-size-01 and bugzilla #33219.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Previously only global arrays with implicit sizes would be patched.
This causes all arrays that are actually accessed to be sized.
Fixes piglit test glsl-fs-implicit-array-size-02.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
emit_adjusted_wpos() needs separate x,y translation values. If we
invert Y, we don't want to effect X.
Part of the fix for http://bugs.freedesktop.org/show_bug.cgi?id=26795
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The set of internalFormat parameters accepted by glRenderBufferStorage
depends on the EXT vs. ARB version of framebuffer_object. The later
added support for GL_ALPHA, GL_LUMINANCE, etc. formats. Note that
these formats might be legal but might not be supported. That should
be checked with glCheckFramebufferStatus().
This reverts commit 65c41d55a0.
There really are quite a few differences in the set of internal
formats allowed by glTexImage and glRenderbufferStorage.
Per the spec, all OpenVG handles are 32-bit. We can't just cast them
to/from integers on 64-bit systems.
Start fixing that mess by introducing a set of handle/pointer conversion
functions in handle.h. The next step is to implement a handle/pointer
hash table...
We were sending too long requests for GLXChangeDrawableAttributes,
GLXGetDrawableAttributes, GLXDestroyPixmap and GLXDestroyWindow.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
FLT_TO_INT is a vector instruction, despite what the (current) documentation
says. FLT_TO_INT_FLOOR and FLT_TO_INT_RPI aren't explicitly mentioned in the
documentation, but those are vector instructions too.
largely a merge of the previously discussed origin/gallium-resource-sampling
but updated.
the idea is to allow arbitrary binding of resources, the way opencl, new gl
versions and dx10+ require, i.e.
DCL RES[0], 2D, FLOAT
LOAD DST[0], SRC[0], RES[0]
SAMPLE DST[0], SRC[0], RES[0], SAMP[0]
Contrary what the name may suggest, LLVM's opaque types are used for
recursive types -- types whose definition refers itself -- so opaque
types correspond to pre-declaring a structure in C. E.g.:
struct node;
struct link {
....
struct node *next;
};
struct node {
struct link link;
}
Void pointers are also disallowed by LLVM. So the suggested way of creating
what's commonly referred as "opaque pointers" is using byte pointer (i.e.,
uint8_t * ).
Fixes the build when selecting driver=osmesa and building static libraries.
Otherwise, mklib tries to add the ‘-ltalloc’ object to the archive, which
obviously fails.
Clients which statically link to osmesa will need to link to libtalloc also,
as specified in the Libs.private of osmesa.pc.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=33360
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
I couldn't make it work.
GB_TILE_CONFIG.Z_EXTENDED, which enables per-pixel Z clamping, and
VAP_CLIP_CNTL.CLIP_DISABLE, which disables clipping, do help, but they
also add regressions like random graphics corruptions in some games.
But we have to cheat and peek at the GENERIC semantic indices the
state tracker uses for TEXn.
Only outputs from 0x300 to 0x37c can be replaced, and so we have to
know on shader compilation which ones to put there in order to keep
doing separate shader objects properly.
At some point I'll probably create a patch that makes gallium not
force us to discard the information about what is a TexCoord.
The _NEW_ACCUM flag was only set when changing the accumulation
buffer clear color and never used anywhere. Reclaim that dirty bit.
Clean up the definitions of the other dirty bit flags.
Only a few texture object parameters can effect texture completeness:
min level, max level, minification filter. Don't mark the texture
incomplete for other texture object state changes.
We are not required to do the linear->sRGB conversion if ARB_framebuffer_sRGB
is unsupported. However I think the conversion should work in hw except
for blending, which matches the D3D9 behavior.
The rvalue of the returned value can be NULL if the shader says
'return foo();' and foo() is a function that returns void.
Existing GLSL specs do *NOT* say that this is an error. The type of
the return value is void. If the return type of the function is also
void, then this should compile without error. I expect that future
versions of the GLSL spec will fix this (wink, wink, nudge, nudge).
Fixes piglit test glsl-1.10/compiler/expressions/return-01.vert and
bugzilla #33308.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Before, we were converting between linear/sRGB in glReadPixels,
glDrawPixels, glAccum, etc if the renderbuffer was an sRGB texture.
Those all need to operate on pixel values as-is without conversion.
Also, when setting up render-to-texture, if the texture is sRGB the
pipe_surface view must be linear RGB. This will change when we
support GL_ARB_framebuffer_sRGB.
This fixes http://bugs.freedesktop.org/show_bug.cgi?id=33353
When we read/write image tiles we need to use the format specified
in the pipe_surface, not the pipe_transfer format (which comes from
the underlying texture/resource format).
This comes up when rendering to sRGB surfaces (via OpenGL render to
texture). Ignoring the new GL_ARB/EXT_framebuffer_sRGB extension
for now, when we render to a sRGB surface we need to treat it like
a regular, linear colorspace RGB surface. Before, when we read/wrote
tiles to sRGB surfaces we were inadvertantly doing the color space
conversion.
The new function, pipe_get_tile_rgba_format(), no longer takes a
swizzle (we weren't actually using it anywhere). Rename it to
indicate that the format is passed explicitly.
GLES can be enabled by running scons with
$ scons gles=yes
When gles=yes is given, the build is changed in three ways. First,
libmesa.a will be built with FEATURE_ES1 and FEATURE_ES2. This makes
DRI drivers and libEGL support and advertise GLES support. Second, GLES
libraries will be created. They are libGLESv1_CM, libGLESv2, and
libglapi. Last, libGL or opengl32 will link to libglapi. This change
is required as _glapi_* will be declared as __declspec(dllimport) in
libmesa.a on windows. libmesa.a expects those symbols to be defined in
another DLL. Due to this change to GL, GLES support is marked
experimental.
Note that GLES requires libxml2-python to generate some of its sources.
The original allocations use regs->regs as the context, so talloc will
happily ignore the context given here. Change it to match to clarify
that it isn't changing.
Improves the cases when:
* an explicit assignment references the read-only variable
* an 'out' or 'inout' function parameter references the read-only variable
This adds the get/enable enums and internal gl_config storage
for this extension.
In theory this is all that is needed to enable this extension
from what I can see, since its not mandatory to implement the
features if you don't advertise the visuals or the fb configs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The loop to choose a pixel format for the window was incrementing
'i' after we succeeded in creating the window so if we chose format[0]
for graw_create_window_and_screen() we were putting format[1] in
the pipe_resource template for creating the render target.
This only worked because of the order of the elements in the formats[]
array.
The graw_xlib.c code now properly compares the requested gallium pixel
format against the visual's color layout.
Update all the graw demos to fix the off-by-one-i error.
ideally this should be set once in the beginning of CS but there's
no way to change values there while in the middle of rendering.
For now reemitting SQ setup seems to work probably due to
r700WaitForIdleClean after each render
currently does not to try to decrease values once increased
fixes hangs in glsl-vs-vec4-indexing-temp-src-in-nested-loop-combined
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined for my rv740
maybe more for other chips
When --enable-shared-glapi is specified, libGL will share libglapi with
OpenGL ES instead of defining its own copy of glapi. This makes sure an
app will get only one copy of glapi in its address space.
The new option is disabled by default. When enabled, libGL and libglapi
must be built from the same source tree and distributed together. This
requirement comes from the fact that the dispatch offsets used by these
libraries are re-assigned whenever GLAPI XMLs are changed.
For GLX, indirect rendering for has_different_protocol() functions is
tricky. A has_different_protocol() function is assigned only one
dispatch offset, yet each entry point needs a different protocol opcode.
It cannot be supported by the shared glapi. The fix to this is to make
glXGetProcAddress handle such functions specially before calling
_glapi_get_proc_address.
Note that these files are automatically generated/re-generated
src/glx/indirect.c
src/glx/indirect.h
src/mapi/glapi/glapi_mapi_tmp.h
Move _glapi_* symbols from libGLESv1_CM.so and libGLESv2.so to
libglapi.so. This makes sure an app will get only one copy of glapi in
its address space.
Note that with this change, libGLES* and libglapi must be built from the
same source tree and distributed together. This requirement comes from
the fact that the dispatch offsets used by these libraries are
re-assigned whenever GLAPI XMLs are changed.
In bridge mode, mapi no longer implements glapi.h. It becomes a user of
glapi.h. Imagine an app that uses both libGL.so and libGLESv2.so.
There will be two copies of glapi in the app's memory. It is possible
that _glapi_get_dispatch does not return what _glapi_set_dispatch set,
if they access different copies of the global variables. The solution
to this situation to build either one of the libraries as a bridge to
the other. Or build both libraries as bridges to another shared
glapi library.
When MAPI_MODE_GLAPI is defined, u_current_table is renamed to
_glapi_Dispatch or _glapi_tls_Dispatch. The ASM dispatchers should not
use hardcoded name.
The new implementation is based on mapi. No new script is needed. As
noted in sources.mk, the way to use it is to compile MAPI_GLAPI_SOURCES
with MAPI_MODE_GLAPI defined.
Fix GLAPIPrinter, ES1APIPrinter, and ES2APIPrinter to output files that
are ready for compilation. Since gl_and_es_API.xml is based on
gl_API.xml, the hidden and handcode attributes of entries have to be
overridden for ES1APIPrinter and ES2APIPrinter.
gl_and_es_API.xml defines OpenGL ES 1.1 and 2.0 API as well as OpenGL
API. It consists of gl_API.xml and the newly added es_EXT.xml,
ARB_get_program_binary.xml, OES_single_precision.xml, and
OES_fixed_point.xml.
This fixes a bunch of unnecessary barriers due to the scheduler not
knowing what that arbitrary register description refers to when trying
to reason about its dependencies.
The result is rescheduling in the convolution kernel shader in
Lightsmark, which results in avoiding register spilling and increasing
the performance of the first scene from 6-7 fps midway through the
panning to 11fps. The register spilling was a regression from Mesa
7.9 to Mesa 7.10.
Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also
reschedules the giant multiply tree at the end of
glsl-fs-convolution-1 so that we end up not spilling registers,
producing the expected level of performance.
Can happen during swrast fallbacks if a buffer is somehow bound as
a render target and a texture.
Fixes gnome-shell on nv20, and gets it mostly working on nv10.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
sw clears were being used and not getting the correct offsets in the span
code.
also not emitting correct offsets for CB draws to texture levels.
(I've no idea why I'm playing with r100).
This is a candidate for 7.9 and 7.10
According to R700 ISA we have only two channels for cfile constants.
This patch makes piglit tests "glsl1-constant array with constant
indexing" happy on RV710.
Fixes the following Piglit tests:
glslparsertest/shaders/array2.frag
glslparsertest/shaders/dataType6.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a potential failure when a begin/end_query is the first
thing to happen after flushing the scene.
NOTE: This is a candidate for the 7.10 and 7.9 branches.
This revealed a bug in ra_get_spill_benefit where we only considered
the benefit of the first adjacency we were to remove, explaining some
of the ugly spilling I've seen in shaders. Because of the reduced
spilling, it reduces the runtime of glsl-fs-convolution-1 36.9% +/-
0.9% (n=5).
This was recommended in the original paper, but I figued "make it run"
before "make it fast". Now we make it fast. Reduces the runtime of
glsl-fs-convolution-1 by 12.7% +/- 0.6% (n=5).
Our use of the register allocator in i965 is somewhat unusual.
Whereas most architectures would have a smaller set of registers with
fewer register classes and reuse that across compilation, we have 1,
2, and 4-register classes (usually) and a variable number up to 128
registers per compile depending on how many setup parameters and push
constants are present. As a result, when compiling large numbers of
programs (as with glean texCombine going through ff_fragment_shader),
we spent much of our CPU time in computing the q[] array. By keeping
a separate list of what the conflicts are for a particular reg, we
reduce glean texCombine time 17.0% +/- 2.3% (n=5).
We don't expect this optimization to be useful for 915, which will
have a constant register set, but it would be useful if we were switch
to this register allocator for Mesa IR.
Fixes texrect-many regression with ff_fragment_shader -- as we added
refs to the subsequent texcoord scaling paramters, the array got
realloced to a new address while our params[] still pointed at the old
location.
The driver was saying that independend blend functions was not supported,
but it really was. The driver was using the per-target independend blend
factors but the state tracker was only setting the 0th one (per the
Gallium spec).
Fixes a piglit fbo-drawbuffers2-blend regression.
See https://bugs.freedesktop.org/show_bug.cgi?id=33215
The removed semantic check also exists in ast_type_specifier::hir(), which
is a more natural location for it.
The check verified that precision statements are applied only to types
float and int.
* Check that precision qualifiers only appear in language versions 1.00,
1.30, and later.
* Check that precision qualifiers do not apply to bools and structs.
Fixes the following Piglit tests:
* spec/glsl-1.30/precision-qualifiers/precision-bool-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-02.frag
Change default value to ast_precision_none, which denotes the absence of
a precision of a qualifier.
Previously, the default value was ast_precision_high. This made it
impossible to detect if a precision qualifier was present or not.
The check is performed only in GLSL versions >= 1.30.
From section 4.3.4 of the GLSL 1.30 spec:
"It is an error to use centroid in in a vertex shader."
Fixes Piglit test
spec/glsl-1.30/compiler/storage-qualifiers/vs-centroid-in-01.vert
The check is performed only in GLSL versions >= 1.30.
Fixes the following Piglit tests:
* spec/glsl-1.30/compiler/interpolation-qualifiers/fs-smooth-02.frag
* spec/glsl-1.30/compiler/interpolation-qualifiers/vs-smooth-01.vert
... and 'centroid varying'. The check is performed only in GLSL
versions >= 1.30.
From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
"interpolation qualifiers may only precede the qualifiers in, centroid
in, out, or centroid out in a declaration. They do not apply to the
deprecated storage qualifiers varying or centroid varying."
Fixes Piglit test
spec/glsl-1.30/compiler/interpolation-qualifiers/smooth-varying-01.frag.
If an interpolation qualifier is present, then the method returns that
qualifier's string representation. For example, if the noperspective bit
is set, then it returns "noperspective".
This implements the extension by choosing a different set of texture
fetch functions when the texture parameter changes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a literal slot isn't used it should be set
to 0 instead of an uninitialized value. Also the
channels for pre R700 trig functions were incorrect.
And most important literals were not counted against ndw,
resulting in an invalid force_add_cf detection.
We don't have all of the features of this extension hooked up yet, but
the consensus yesterday was that since those features are things that
we should also be supporting in our ES2 implementation, claiming ES2
here too doesn't make anything worse and will make incremental
improvement through piglit easier.
This code should never have been triggered, but I often did anyway
when I disabled optimization passes during debugging, then spent my
time debugging that this code doesn't work.
The comment of "this is just like teximages except for..." is a pretty
good clue that we're handling this wrong. By just using the teximage
code, we catch a bunch of cases we'd missed, like GL_RED and GL_RG.
This catches more opportunities than the prog_optimize.c code on
openarena's fixed function shaders turned to GLSL, mostly due to
looking at multiple source instructions for copy propagation
opportunities. It should also be much more CPU efficient than
prog_optimize.c's code.
The usage of macro V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_FLT_TO_INT_FLOOR was
introduced by commit 323ef3a1f0 but the
macro is undefined. Disable this case to fix the build for now.
Add a bit in struct gl_extensions for OES_standard_derivatives, and enable
the bit by default. Advertise the extension only if the bit is enabled.
Previously, OES_standard_derivatives was advertised in GLES2 contexts
if ARB_framebuffer_object was enabled.
The specs that add 'layout' require the use of 'in' or 'out'.
However, a number of implementations, including Mesa, shipped several
of these extensions allowing the use of 'varying' and 'attribute'.
For these extensions only a warning is emitted.
This differs from the behavior of Mesa 7.10. Mesa 7.10 would only
accept 'attribute' with 'layout(location)'. This behavior was clearly
wrong. Rather than carrying the broken behavior forward, we're just
doing the correct thing.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
All of the extensions that add the 'layout' keyword also enable (and
required) the use of 'in' and 'out' with shader globals.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
When use_spoken is true, istart (the first vertex of this segment) is
replaced by i0 (the spoken vertex of the fan). There are still icount
vertices.
Thanks to Brian Paul for spotting this.
Without this, X doesn't start with UMS on r300g.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Paulo Zanoni <pzanoni@mandriva.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The idea is to be able to match a driver using the following order
try egl_gallium with hw renderer
try egl_dri2
try egl_gallium with sw renderer
try egl_glx
given the module list
egl_gallium
egl_dri2
egl_glx
For that, UseFallback initialization option is added. The module list
is matched twice: with the option unset and with the option set. In the
first pass, egl_gallium skips its sw renderer and egl_glx rejects to
initialize since UseFallback is not set. In the second pass,
egl_gallium skips its hw renderer and egl_dri2 rejects to initialize
since UseFallback is set. The process stops at the first driver that
initializes the display.
Reorder/rename and document the fields that should be set by the driver during
initialization. Drop the major/minor arguments from drv->API.Initialize.
This makes it unnecessary to pass _mesa_glsl_parse_state around
everywhere, making at least the prototypes a lot easier to read.
It's also more C++-ish than a pile of static C functions.
All of these functions used to take s_list pointers so they wouldn't all
need SX_AS_LIST conversions and error checking. However, the new
pattern matcher conveniently does this for us in one centralized place.
So there's no need to insist on s_list. Switching to s_expression saves
a bit of code and is somewhat cleaner.
Previously, the IR reader was riddled with code that:
1. Checked for the right number of list elements (via a linked list walk)
2. Retrieved references to each component (via ->next->next pointers)
3. Downcasted as necessary to make sure that each sub-component was the
right type (i.e. symbol, int, list).
4. Checking that the tag (i.e. "declare") was correct.
This was all very ad-hoc and a bit ugly. Error checking had to be done
at both steps 1, 3, and 4. Most code didn't even check the tag, relying
on the caller to do so. Not all callers did.
The new pattern matching module performs the whole process in a single
straightforward function call, resulting in shorter, more readable code.
Unfortunately, MSVC does not support C99-style anonymous arrays, so the
pattern must be declared outside of the match call.
stObj->pt is null when a TFP texture is passed to st_finalize_texture,
and with the changes introduced in the above commit this resulted in a
new texture being created and the existing image being copied into it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Fixes a PT_NOT_PRESENT error cause by:
- allocating in VRAM
- emitting GART relocs to 0x17bc/0x17c0, moving the buffer
- telling the bufmgr that the buffer should be in VRAM when we use it,
but not correcting the value sent to 0x17bc/0x17c0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes a failed assertion when a renderbuffer ID that was gen'd but not
previously bound was passed to glFramebufferRenderbuffer(). Generate
the same error that NVIDIA does.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a problem when glDrawBuffers(GL_NONE). The fragment program
was writing to color output[0] but OutputsWritten was 0. That led to a
failed assertion in the Mesa->TGSI translation code.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The extension string in GLES1 contexts always advertised
GL_OES_point_sprite. Now advertisement depends on ARB_point_sprite being
enabled.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Change all OES extension strings that depend on ARB_framebuffer_object to
instead depend on EXT_framebuffer_object.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Add GL_OES_stencil8 to ES2.
Remove the following:
GL_OES_compressed_paletted_texture : ES1
GL_OES_depth32 : ES1, ES2
GL_OES_stencil1 : ES1, ES2
GL_OES_stencil4 : ES1, ES2
Mesa advertised these extensions, but did not actually support them.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Place GL, GLES1, and GLES2 extensions in a unified extension table. This
allows one to enable, disable, and query the status of GLES1 and GLES2
extensions by name.
When tested on Intel Ironlake, this patch did not alter the extension
string [as given by glGetString(GL_EXTENSIONS)] for any API.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In particular, variables cannot be redeclared invariant after being
used.
Fixes piglit test invariant-05.vert and bugzilla #29164.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
With the change to not reset baselevel, this GL_LINEAR filtering was
resulting in generating mipmaps off of the base level instead of the
next higher detail level. Fixes fbo-generatemipmap-filtering.
Reported by: Neil Roberts <neil@linux.intel.com>
The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
Update SConscripts to re-enable or add support for EGL on windows and
x11 platforms respectively. targets/egl-gdi is replaced by
targets/egl-static, where "-static" means pipe drivers and state
trackers are linked to statically by egl_gallium, and egl_gallium is a
built-in driver of libEGL. There is no more egl_gallium.dll on Windows.
This greatly improves codegen for programs with flow control by
allowing coalescing for all instructions at the top level, not just
ones that follow the last flow control in the program.
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
Fixes piglit tests glsl-1.20/compiler/qualifiers/in-01.vert and
glsl-1.20/compiler/qualifiers/out-01.vert and bugzilla #32910.
NOTE: This is a candidate for the 7.9 and 7.10 branches. This patch
also depends on the previous two commits.
When GCC encounters a division by zero in a preprocessor directive, it
generates an error. Since the GLSL spec says that the GLSL
preprocessor behaves like the C preprocessor, we should generate that
same error.
It's worth noting that I cannot find any text in the C99 spec that
says this should be an error. The only text that I can find is line 5
on page 82 (section 6.5.5 Multiplicative Opertors), which says,
"The result of the / operator is the quotient from the division of
the first operand by the second; the result of the % operator is
the remainder. In both operations, if the value of the second
operand is zero, the behavior is undefined."
Fixes 093-divide-by-zero.c test and bugzilla #32831.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
In _token_list_equal_ignoring_space(token_list_t*, token_list_t*), add
a guard that prevents dereferncing a null token list.
This fixes test src/glsl/glcpp/tests/092-redefine-macro-error-2.c and
Bugzilla #32695.
When attaching a small mipmap level to an FBO, the original gen4
didn't have the bits to support rendering to it. Instead of falling
back, just blit it to a new little miptree just for it, and let it get
revalidated into the stack later just like any other new teximage.
Bug #30365.
If we hit this path, we're level 1+ and the base level got allocated
as a single level instead of a full tree (so we don't match
intelObj->mt). This tries to recover from that so that we end up with
2 allocations and 1 validation blit (old -> new) instead of
allocations equal to number of levels and levels - 1 blits.
We don't need to worry about levels other than MaxLevel because we're
minifying -- the lower levels (higher detail) won't contribute to the
result. By changing BaseLevel, we forced hardware that doesn't
support BaseLevel != 0 to relayout the texture object.
This reverts commit 7ce6517f3a.
This reverts commit d60145d06d.
I was wrong about which generations supported baselevel adjustment --
it's just gen4, nothing earlier. This meant that i915 would have
never used the mag filter when baselevel != 0. Not a severe bug, but
not an intentional regression. I think we can fix the performance
issue another way.
Most _3DSTATE defines contain the command type, sub-type, opcode, and
sub-opcode (i.e. 0x7905). These, however, contain only the sub-opcode
(i.e. 0x05). Since they are inconsistent with the rest of the code and
nothing uses them, simply delete them.
The _3DOP and _3DCONTROL defines seemed similar, and were also unused.
From reading EXT_texture_sRGB and EXT_framebuffer_sRGB and interactions
with FBO I've found that swrast is converting the sRGB values to linear for
blending when an sRGB texture is bound as an FBO. According to the spec
and further explained in the framebuffer_sRGB spec this behaviour is not
required unless the GL_FRAMEBUFFER_SRGB is enabled and the Visual/config
exposes GL_FRAMEBUFFER_SRGB_CAPABLE_EXT.
This patch fixes swrast to use a separate Fetch call for FBOs bound to
SRGB and avoid the conversions.
v2: export _mesa_get_texture_dimensions as per Brian's comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With core mesa doing runtime API checks, GLES overlay is no longer
needed. Make --enable-gles-overlay equivalent to --enable-gles[12].
There may still be places where compile-time checks are done. They
could be fixed case by case.
This is a step forward for compatibility with really old GLX. But the
real reason for making this change now is so that we can make egl_glx a
built-in driver without having to link to libGL.
In preparation for making egl_dri2 built-in. It also handles
symbol lookup error: /usr/local/lib/egl/egl_dri2.so: undefined symbol:
_glapi_get_proc_address
more gracefully.
If you want to enable noop set GALLIUM_NOOP=1 as an env variable.
You need first to enable noop wrapping for your driver see change
to src/gallium/targets/dri-r600/ in this commit as an example.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add release function for texture_from_pixmap extension.
Some platform need to release texture image for texture_from_pixmap
extension, add this interface for those platforms.
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
There really shouldn't be any difference between the two for us.
Fixes a bug where Z16 renderbuffers would be untiled on gen6, likely
leading to hangs.
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
ARB_fbo no longer disallows mismatched width/height on attachments
(shouldn't be any problem), mixed format color attachments (we only
support 1), and L/A/LA/I color attachments (we already reject them on
965 too). It requires Gen'ed names (driver doesn't care), and adds
FramebufferTextureLayer (we don't do texture arrays). So it looks
like we're already in the position we need to be for this extension.
Bug #27468, #32381.
In general, we have to negate in immediate values we pass in because
the src1 negate field in the register description is in the bits3 slot
that the 32-bit value is loaded into, so it's ignored by the hardware.
However, the src0 negate field is in bits1, so after we'd negated the
immediate value loaded in, it would also get negated through the
register description. This broke this VP instruction in the position
calculation in civ4:
MAD TEMP[1], TEMP[1], CONST[256].zzzz, CONST[256].-y-y-y-y;
Bug #30156
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
Fix the error in uniform row calculating, it may alloc one line
more which may cause out of range on memory usage, sometimes program
aborted when free the memory.
NOTE: This is a candidate for 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <brianp@vmware.com>
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
Previously the 'STDGL invariant(all)' pragma added in GLSL 1.20 was
simply ignored by the compiler. This adds support for setting all
variable invariant.
In GLSL 1.10 and GLSL ES 1.00 the pragma is ignored, per the specs,
but a warning is generated.
Fixes piglit test glsl-invariant-pragma and bugzilla #31925.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
GLSL 1.10 and 1.20 allow any sort of sampler array indexing.
Restrictions were added in GLSL 1.30. Commit f0f2ec4d added support
for the 1.30 restrictions, but it broke some valid 1.10/1.20 shaders.
This changes the error to a warning in GLSL 1.10, GLSL 1.20, and GLSL
ES 1.00.
There are some spurious whitespace changes in this commit. I changed
the layout (and wording) of the error message so that all three cases
would be similar. The 1.10/1.20 and 1.30 text is the same. The only
difference is that one is an error, and the other is a warning. The
GLSL ES 1.00 wording is similar but not quite the same.
Fixes piglit test
spec/glsl-1.10/compiler/constant-expressions/sampler-array-index-02.frag
and bugzilla #32374.
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
The overhead of resource_create, transfer_inline_write, and resource_destroy
to upload constant data is very visible with some apps in sysprof, and
as such should be eliminated.
My approach uses a user buffer to pass a pointer to a driver. This gives
the driver the freedom it needs to take the fast path, which may differ
for each driver.
This commit addresses the same issue as Jakob's one that suballocates out
of a big constant buffer, but it also eliminates the copy to the buffer.
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
The map/unmap overhead can be significant even though there is no waiting on busy
buffers. There is simply a huge number of uploads.
This is a performance optimization for Torcs, a car racing game.
Directly include mtypes.h if a file uses a gl_context struct. This
allows future removal of headers that are not strictly necessary but
indirectly include mtypes.h for a file.
BaseLevel/MaxLevel are mostly used for two things: clamping texture
access for FBO rendering, and limiting the used mipmap levels when
incrementally loading textures. By restricting our mipmap trees to
just the current BaseLevel/MaxLevel, we caused reallocation thrashing
in the common case, for a theoretical win if someone really did want
just levels 2..4 or whatever of their texture object.
Bug #30366
This has always been ugly about our texture code -- object base/max
level vs intel object first/last level vs image level vs miptree
first/last level. We now get rid of intelObj->first_level which is
just tObj->BaseLevel, and make intelObj->_MaxLevel clearly based off
of tObj->_MaxLevel instead of duplicating its code (incorrectly, as
image->MaxLog2 only considers width/height and not depth!)
It was quite a mess by trying to do NULL renderbuffers and real
renderbuffers in the same function. This clarifies the common case of
real renderbuffers.
This makes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc
match
fbo-generatemipmap-formats GL_EXT_texture_compression_s3tc
and swrast in bad DXT1_RGBA alpha=0 handling, but it means we won't
unpack and repack someone's textures into uncompressed SARGB8 format.
It's just LUMINANCE, not LUMINANCE_ALPHA. Fixes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc assertion failure
when it tries to pack the L8 channels into LUMINANCE_ALPHA and wonders
why it's trying to do that.
From the r600 ISA:
Each ALU clause can lock up to four sets of constants
into the constant cache. Each set (one cache line) is
16 128-bit constants. These are split into two groups.
Each group can be from a different constant buffer
(out of 16 buffers). Each group of two constants consists
of either [Line] and [Line+1] or [line + loop_ctr]
and [line + loop_ctr +1].
For supporting more than 64 constants, we need to
break the code into multiple ALU clauses based
on what sets of constants are needed in that clause.
Note: This is a candidate for the 7.10 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
rc_inst_can_use_presub() wasn't checking for too many RGB sources in
Alpha instructions or too many Alpha sources in RGB instructions.
Note: This is a candidate for the 7.10 branch.
Perform this check in ast_declarator_list::hir().
From section 4.3.6 of the GLSL 1.30 spec:
"If a vertex output is a signed or unsigned integer or integer
vector, then it must be qualified with the interpolation
qualifier
flat."
Allow redeclaration of the following built-in variables with an
interpolation qualifier in language versions >= 1.30:
* gl_FrontColor
* gl_BackColor
* gl_FrontSecondaryColor
* gl_BackSecondaryColor
* gl_Color
* gl_SecondaryColor
See section 4.3.7 of the GLSL 1.30 spec.
We were looking at the current draw buffer instead to see whether the
depth/stencil combination matched. So you'd get told your framebuffer
was complete, until you bound it and went to draw and we decided that
it was incomplete.
The _ColorDrawBuffers is a piece of computed state that gets for the
current draw/read buffers at _mesa_update_state time. However, this
function actually gets used for non-current draw/read buffers when
checking if an FBO is complete from the driver's perspective. So,
instead of trying to just look at the attachment points that are
currently referenced by glDrawBuffers, look at all attachment points
to see if they're driver-supported formats. This appears to actually
be more in line with the intent of the spec, too.
Fixes a segfault in my upcoming fbo-clear-formats piglit test, and
hopefully bug #30278
Until we know how hw converts quads to polygon in beginning of
3D pipeline, for now unconditionally use last vertex convention.
Fix glean/clipFlat case.
When figuring out whether a renderbuffer should be used to set the
visual bits of an FBO, we were missing important baseformats like
GL_RED, GL_RG, and GL_LUMINANCE.
st/egl should be enabled with --enable-openvg even the driver is xlib or
osmesa. Also, GLX_DIRECT_RENDERING should not be defined because libdrm
is not checked.
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
The initial values in the grctx are 0x0000 anyway, and re-binding them
all to 0x0000 destroys some init done by the nouveau drm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This reverts commit de6fd527a5.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16.
Previously, shaders with control flow nested 17 levels deep would
cause a driver assertion or segmentation fault.
Gen6 (Sandybridge) hardware no longer has this restriction.
Fixes fd.o bug #31967.
This adds a new optional max_depth parameter (defaulting to 0) to
lower_if_to_cond_assign, and makes the pass only flatten if-statements
nested deeper than that.
By default, all if-statements will be flattened, just like before.
This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign,
to match the new naming conventions.
This gets my vote for most pointless extension of all time, I'm guessing
some driver could possibly optimise for this instead of counting it might
just get a true/false, but I'm not really sure.
need this to eventually advertise 3.3 despite its total uselessness.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rename MAPI_GLAPI_SOURCES to MAPI_UTIL_SOURCES. Rename macro
MAPI_GLAPI_CURRENT to MAPI_MODE_UTIL. Update the comments to make it
clear that mapi may be used in two ways and how.
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
No need to register function prototypes in the module now that we call
the C function pointer directly -- less LLVM objects lying around.
Limited testing with lp_test_format.
Even though a bound texture stays bound when calling set_fragment_sampler_views,
it must be assigned a new cache region depending on the occupancy of other
texture units.
This fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28800
Thanks to Álmos <aaalmosss@gmail.com> for finding the bug in the code.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
We need to keep using the pipe_get_tile_swizzle() even though there's
no swizzling because we need to explicitly pass in the surface format.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=32459
The only mismatch between the two is that we have to clear the
destination's alpha to 1.0. Fixes WOW performance on my Ironlake,
from a few frames a second to almost playable.
Before, we were going off of a couple of known (hopeful) matches
between internalFormats and the cpp of the read buffer. Instead, we
can now just look at the gl_format of the two to see if they match.
We should avoid bad blits that might have been possible before, but
also allow different internalFormats to work without having to
enumerate each one.
The blit that follows appears in the command stream so it's serialized
with previous rendering. Any queued vertices in the tnl layer were
already flushed up in mesa/main/.
Create a constant int pointer to the C function, then cast it to the
function's type. This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.
NOTE: This is a candidate for the 7.10 branch
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
This is the hack for input interactivity of frontbuffer rendering
(like we do for backbuffer at intelDRI2Flush()) by waiting for the n-2
frame to complete before starting a new one. However, for an
application doing multiple contexts or regular rebinding of a single
context, this would end up lockstepping the CPU to the GPU because
every unbind was considered the end of a frame.
Improves WOW performance on my Ironlake by 48.8% (+/- 2.3%, n=5)
Given a dispatch slot, entry_get_public returns the address of the
corresponding public entry point. There may be more than one of them.
But since they are all equivalent, it is fine to return any one of them.
With entry_get_public, the address of any public entry point can be
calculated at runtime when an assembly dispatcher is used. There is no
need to have a mapping table in such case. This omits the unnecessary
relocations from the binary.
Hidden entries are just like normal entries except that they are not
exported. Since it is not always possible to hide them, and two hidden
aliases can share the same entry, the name of hidden aliases are mangled
to '_dispatch_stub_<slot>'.
Split out function name generation from _c_decl to _c_function, and use
it everywhere. Add an optional 'export' argument to _cdecl. It is
prepended to the returned string.
Really no idea why I didn't see this before, but these values were opposite
the register spec.
this seems to fix rv530 HiZ on my laptop, will reenable in next commit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GL fragColor semantics we need to tell the pipe drivers that the fragment
shader color result is to be replicated to all bound color buffers, this
adds the basic TGSI + documentation.
v2: fix missing comma pointed out by Tilman on mesa-dev.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If 'start' is odd, render the first triangle with indices embedded
in the command stream, which adds 3 to 'start' and makes it even.
Then continue with the fast path.
This is a bandaid on the problem that if some formats were not renderable
(like luminance_alpha), st/mesa fell back to some RGBA format, so basically
some non-renderable formats were actually not used at all. This is only
a problem with hardware drivers, softpipe can render to anything.
Instead, require only RGB8/RGBA8 to be renderable.
Radeon GPUs can do this. R600 can even do render-to-texture.
Packing and extracting aren't implemented, but we shouldn't hit them (I think).
Tested with swrast, softpipe, and r300g.
Just like everywhere else, I never trust my constant uploads to
correctly put constants in the right places, even though that's so
rarely where the issue is.
It's mostly like gen4 message descriptor setup, except that the sizes
of type/control changed to be like gen5. Fixes 21 piglit cases on
gm45, including the regressions in bug #32311 from increased VS
constant buffer usage.
Fixes this GCC warning.
lp_bld_const.h: In function 'lp_build_const_int_pointer':
lp_bld_const.h:137: warning: cast from pointer to integer of different size
Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.
Fixes many cases' hang and failure in GL conformance test.
Set window_bit only when the visual id is greater than zero. Correct
visual types. Skip slow configs as they are not relevant. Finally, do
not return duplicated configs.
All single-buffered configs were ignored before to make sure
EGL_RENDER_BUFFER is settable for window surfaces. It is better to
allow single-buffered configs and set EGL_WINDOW_BIT only for
double-buffered ones. This way there can be single-buffered pixmaps.
The SNB alt-mode math does the denorm and inf reduction even for a
"raw MOV" like we do for g0 message header setup, where we are moving
values that aren't actually floats. Just use UD type, where raw MOVs
really are raw MOVs.
Fixes glxgears since c52adfc2e1, but no
piglit tests had regressed(!)
Can't get away from referencing upload buffer as after flush a vertex buffer
using the upload buffer might still be active. Likely need to simplify the
pipe_refence a bit so we don't waste so much cpu time in it.
candidates for 7.10 branch
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Instead of when we read texture tiles. Now swizzling happens after
the shadow depth compare step. This fixes the piglit glsl-fs-shadow2d*
tests (except for proj+bias because of a GLSL bug).
We need to swizzle after the shadow comparison so that the GL_DEPTH_MODE
functionality is handled properly.
This fixes all the piglit glsl-fs-shadow2d*.shader_test cases, except
for glsl-fs-shadow2dproj-bias.shader_test which fails because of a
bug in the GLSL compiler (fd.o 32395).
Note the support for non float vertex draw likely regressed need to
find what we want to do there.
candidates for 7.10 branches
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is still awful, but my ability to care about reworking the old
backend so we can just get a temporary value into a POW is awfully low
since the new backend does this all sensibly.
Fixes:
fp1-LIT test 1
fp1-LIT test 3 (case x < 0)
fp1-POW test (exponentiation)
fp-lit-mask
commit 4f106f44a32eaddb6cf3fea6ba5ee9787bff609a
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 14:06:08 2010 -0700
st/mesa: reorganize vertex program translation code
Now it looks like the fragment and geometry program code.
Also remove the serial number fields from programs. It was used to
determine when new translations were needed. Now the variant key is
used for that. And the st_program_string_notify() callback removes all
variants when the program's code is changed.
commit e12d6791c5e4bff60bb2e6c04414b1b4d1325f3e
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 13:38:12 2010 -0700
st/mesa: implement geometry shader varients
Only needed in order to support per-context gallium shaders.
commit c5751c673644808ab069259a852f24c4c0e92b9d
Author: Brian Paul <brianp@vmware.com>
Date: Sun Dec 12 15:28:57 2010 -0700
st/mesa: restore glDraw/CopyPixels using new fragment program variants
Clean up the logic for fragment programs for glDraw/CopyPixels. We now
generate fragment program variants for glDraw/CopyPixels as needed which
do texture sampling, pixel scale/bias, pixelmap lookups, etc.
commit 7b0bb99bab6547f503a0176b5c0aef1482b02c97
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 17:03:23 2010 -0700
st/mesa: checkpoint: implement fragment program variants
The fragment programs variants are per-context, as the vertex programs.
NOTE: glDrawPixels is totally broken at this point.
commit 2cc926183f957f8abac18d71276dd5bbd1f27be2
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 14:59:32 2010 -0700
st/mesa: make vertex shader variants per-context
Gallium shaders are per-context but OpenGL shaders aren't. So we need
to make a different variant for each context.
During context tear-down we need to walk over all shaders/programs and
free all variants for the context being destroyed.
The new GLSL compiler doesn't support geom shaders yet so disable the
GL_ARB_geometry_shader4 extension. Undo this when geom shaders work again.
NOTE: This is a candidate for the 7.10 branch.
This is the same as what the array dereference handler does.
Fixes piglit test glsl-link-struct-array (bugzilla #31648).
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The hardware supports zero stride just fine. This is a port
of 2af8a19831 from r300g.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The RS690 memory controller prefers things to be on a different
boundary than the discrete GPUs, we had an attempt to fix this,
but it still failed, this consolidates the stride calculation
into one place and removes the really special case check.
This fixes gnome-shell and 16 piglit tests on my rs690 system.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the driver is the last reference to libEGL.so, unloading it will
cause libEGL.so to be unmapped and give problems. Disable the unloading
for now. Still have to figure out the right timing to unload drivers.
The hardware apparently does support a zero stride, so let's use it.
This fixes missing objects in ETQW, but might also fix a ton of other
similar-looking bugs.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
If a source operand has a non-native swizzle (e.g. the KIL instruction
cannot have a swizzle other than .xyzw), the lowering pass uses one or more
MOV instructions to move the operand to an intermediate temporary with
native swizzles.
This commit fixes that the presubtract information was lost during
the lowering.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
This fixes broken rendering of trees in ETQW. The trees still disappear
for an unknown reason when they are close.
Broken since:
2ff9d4474b
r300/compiler: make lowering passes possibly use up to two less temps
NOTE: This is a candidate for the 7.10 branch.
do_assignment may apply implicit conversions to coerce the base type
of initializer to the base type of the variable being declared. Fixes
piglit test glsl-implicit-conversion-02 (bugzilla #32287). This
probably also fixes bugzilla #32273.
NOTE: This is a candidate for the 7.9 branch and the 7.10 branch.
There are exceptions to the table for depth texturing or rendering to
not-quite-supported formats thanks to the non-orthogonal component
selection for surface formats, but it's still a lot simpler.
Fixes piglit valgrind glsl-array-bounds-04 failure (FDO bug 29946).
NOTE:
This is a candidate for the 7.10 branch.
This is a candidate for the 7.9 branch.
The current state is allowed to be undefined past DrawElements et al.
Consequently omit that copying at least in the display list code.
This pays us some percents performance.
Signed-off-by: Brian Paul <brianp@vmware.com>
assert(current_save_state < MAX_META_OPS_DEPTH) did not compile.
Rename current_save_state to SaveStackDepth to be more consistent with
the style of the other fields.
VS places color attributes together so that SF unit can fetch the right
attribute according to object orientation. This fixes light issue in
mesa demo geartrain, projtex.
_mesa_meta_CopyPixels results in nested meta operations on Sandybridge.
Previoulsy the second meta operation overrides all states saved by the
first meta function.
When the application is not linked to any libGL*.so, loading st_GL.so
would give
/usr/local/lib/egl/st_GL.so: undefined symbol: _glapi_tls_Context
In that case, load libGL.so and try again. This works because
util_dl_open loads with RTLD_GLOBAL.
Fix "clear" OpenGL ES 1.1 demo.
Fixes piglit glx-shader-sharing crash.
When shaders are shared by multiple contexts, the shader's draw context
pointer may point to a previously destroyed context. Dereferencing the
context pointer will lead to a crash.
In this case, simply removing the flushing code avoids the crash (the
exec and sse shader paths don't flush here either).
There's a deeper issue here, however, that needs examination. Shaders
should not keep pointers to contexts since contexts might get destroyed
at any time.
NOTE: This is a candidate for the 7.10 branch (after this has been
tested for a while).
Currently we only unroll loops with conditional breaks at the end, which is
the form that lower_jumps generates.
However, if breaks are not lowered, they tend to appear at the beginning, so
add support for a conditional break anywhere.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Found this bug by code inspection. Based off the comments just before
this code, the intent is to find whether the break exists in the "then"
branch or the "else" branch. However, the code actually looked at the
last instruction in the "then" branch twice.
If you used a constant array index to access the matrix, we'd flag a
bunch of wrong inputs/outputs as being used because the index was
multiplied by matrix columns and the actual used index was left out.
Fixes glsl-mat-attribute.
Fixes this GCC warning.
brw_fs.cpp: In function 'brw_reg brw_reg_from_fs_reg(fs_reg*)':
brw_fs.cpp:3255: warning: 'brw_reg' may be used uninitialized in this function
Allow important performance increase by doing hw specific implementation
of the upload manager helper. Drop the range flushing that is not hit with
this code (and wasn't with previous neither). Performance improvement are
mostly visible on slow CPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
r600g is up to a point where all small CPU cycle matter and pb* turn
high on profile. It's mostly because pb try to be generic and thus
trigger unecessary check for r600g driver. To avoid having too much
abstraction & too much depth in the call embedded everythings into
r600_bo. Make code simpler & faster. The performance win highly depend
on the CPU & application considered being more important on slower CPU
and marginal/unoticeable on faster one.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This eases the gen6 implementation, which can only handle up to 32
registers of constants, while likely not penalizing real apps using
reladdr since all of those I've seen also end up hitting the pull
constant buffer. On gen6, the constant map means that simple NV VPs
fit under the 32-reg limit and now succeed. Fixes around 10 testcases.
Raise error if a sampler array is indexed with a non-constant expression.
From section 4.1.7 of the GLSL 1.30 spec:
"Samplers aggregated into arrays within a shader (using square
brackets [ ]) can only be indexed with integral constant
expressions [...]."
System values are shader inputs which don't necessarily change from
vertex to vertex or fragment to fragment. gl_InstanceID and
gl_FrontFacing are examples.
Attribute 0 has no special meaning in GLES2. Add VertexAttrib4f_nopos
for that purpose and make _es_VertexAttrib* call the new function.
Rename _vbo_* to _es_* to avoid confusion. These functions are only
used by GLES, and now some of them (_es_VertexAttrib*) even behave
differently than vbo_VertexAttrib*.
CMP may now use two less temps, other non-native instructions may end up
using one less temp, except for SIN/COS/SCS, which I am leaving unchanged
for now.
This may reduce register pressure inside loops, because the register
allocator doesn't do a very good job there.
That's what I get for not running piglit before pushing.
Don't try to patch types of unsized arrays when linking fails.
Don't try to patch types of unsized arrays that are shared between
shader stages.
Fixes piglit test case glsl-vec-array (bugzilla #31908).
NOTE: This bug does not affect 7.9, but I think this patch is a
candiate for the 7.9 branch anyway.
Types of declared variables and their initializer must match excatly
except for unsized arrays. Previously the type inherritance for
unsized arrays happened implicitly in the emitted assignment.
However, this assignment is never emitted for uniforms. Now that type
is explicitly copied unconditionally.
Fixes piglit test array-compare-04.vert (bugzilla #32035) and
glsl-array-uniform-length (bugzilla #31985).
NOTE: This is a candidate for the 7.9 branch.
With the change of extended math from having the arguments moved into
mrfs and handed off through message passing to being directly hooked
up to the EU, it looks like the piece for doing source modifiers
(negate and abs) was left out.
Fixes:
fog-modes
glean/fp1-ARB_fog_exp test
glean/fp1-ARB_fog_exp2 test
glean/fp1-Computed fog exp test
glean/fp1-Computed fog exp2 test
ext_fog_coord-modes
Previously the code only handled scalars and vectors. This new code
is modeled somewhat after similar code in ir_to_mesa.
Reviewed-by: Eric Anholt <eric@anholt.net>
R6XX GPU doesn't like to have two partial flush writting
back to memory in row without a prior flush of the pipeline.
Add PS_PARTIAL_FLUSH to flush all work between the CP and
the ES, GS, VS, PS shaders.
Thanks a lot to Alban Browaeys (prahal on irc) for investigating
this issue.
Signed-off-by: Alban Browaeys <prahal@yahoo.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
gen6 builtin RSQ apparently clamps negative values to 0 instead of
returning the RSQ of the absolute value like ARB_fragment_program
desires and pre-gen6 apparently does.
Fixes:
glean/fp1-RSQ test 2 (reciprocal square root of negative value)
glean/vp1-RSQ test 2 (reciprocal square root of negative value)
Fixes this GCC warning.
r200_maos_arrays.c: In function 'r200EmitArrays':
r200_maos_arrays.c:113: warning: 'emitsize' may be used uninitialized in this function
It's not always possible to preprocess the content of 3D_LOAD_VBPNTR
in a command buffer, because the offset to all vertex buffers (which
the packet depends on) is derived from the "start" parameter of draw_arrays
and the "indexBias" parameter of draw_elements, but we can at least lazily
make a command buffer for the case when offset == 0, which should occur
most of the time.
The original lazy built-in importing patch did not add the newly created
function to the symbol table, nor actually emit it into the IR stream.
Adding it to the symbol table is non-trivial since importing occurs when
generating some ir_call in a nested scope. A new add_global_function
method, backed by new symbol_table code in the previous patch, handles
this.
Fixes bug #32030.
Avoid rebuilding constant shader state at each draw call,
factor out spi update that might change at each draw call.
Best would be to update spi only when revealent states
change (likely only flat shading & sprite point).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Vertex elements change are less frequent than draw call, those to
avoid rebuilding fetch shader to often build the fetch shader along
vertex elements. This also allow to move vertex buffer setup out
of draw path and make update to it less frequent.
Shader update can still be improved to only update SPI regs (based
on some rasterizer state like flat shading or point sprite ...).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
More than 1023 temporaries were being used for a Cinebench shader before
doing temporary optimization, causing the index value to wrap around to
-1024.
In st_finalize_texture() we were looking at the st_texture_object::
lastLevel field instead of the pipe_resource::last_level field to
determine which resource to store the mipmap in.
Then, in st_generate_mipmap() we need to call st_finalize_texture() to
make sure the destination resource is properly allocated.
These changes fix the broken piglit fbo-generatemipmap-formats test.
Use --cppflags instead of --cflags so that we get the -I and -D flags
we want, but not compiler options like -O3.
A similar change should probably be made for autoconf.
When X or Y or Z is close to W the outcome of the floating point clip
test comparision may be different between the C and x86 asm paths.
That's OK; don't report an error.
See fd.o bug 32093
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
Since the 8-wide first-quarter and 16-wide first-half have the same
bit encoding, we now need to track "do you want instruction
compression" in the compile state.
The FS backend is fine with register level granularity. But for the
brw_wm_emit.c backend, it expects pairs of regs to be used for the
constants, because the whole world is pairs of regs. If an odd number
got used, we went looking for interpolation in the wrong place.
In the SF and brw_fs.cpp fixes to set up interpolation sanely on gen6,
the setup for 16-wide interpolation was left behind. This brings
relative sanity to that path too.
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
The preprocessor magic in mapi was nothing but obfuscation. Rewrite
mapi_abi.py to generate real C code.
This commit removes the hack added in
43121f2086.
Need to check the required primitive type for GS on Sandybridge,
and when GS is disabled, the new state has to be issued too, instead
of only updating URB state with no GS entry, that caused hang on Sandybridge.
This fixes hang issue during conformance suite testing.
This fixes endless vertex shader recompilations in find_translated_vp
if the shader contains an edge flag output.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by Brian Paul <brianp@vmware.com>
Finished up by Marek Olšák.
We can set the constant space to use a different area per-call to the shader,
we can avoid flushing the PVS as often as we do by spreading out the constants
across the whole constant space.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
It appears to be a constant buffer index (in case there are more constant
buffers explicitly used by a shader), i.e. something that Gallium currently
does not use. We treated it incorrectly as the offset to a constant buffer.
Sometimes I'm on the train and want to just read what's generated
under INTEL_DEBUG=vs,wm for some code on another generation. Or, for
the next gen enablement we'll want to dump aub files before we have
the actual hardware. This will let us do that.
rgb_src_factor and rgb_dst_factor should be PIPE_BLENDFACTOR_ONE for
VG_BLEND_SRC_IN and VG_BLEND_DST_IN respectively. VG_BLEND_SRC_OVER can
be supported only when the fb has no alpha channel. VG_BLEND_DST_OVER
and VG_BLEND_ADDITIVE have to be supported with a shader.
Note that Porter-Duff blending rules assume premultiplied alpha.
Convert color values to and back from premultiplied form for blending.
Finally the rendering result of the blend demo looks much closer to that
of the reference implementation.
Drawing an image in VG_DRAW_IMAGE_STENCIL mode produces per-channel
alpha for use in blending. Add a new shader stage to produce and save
it in TEMP[1].
For other modes that do not need per-channel alpha, the stage does
MOV TEMP[1], TEMP[0].wwww
Add a helper function, blend_generic, that supports all blend modes and
per-channel alpha. Make other blend generators a wrapper to it.
Both the old and new code expects premultiplied colors, yet the input is
non-premultiplied. Per-channel alpha is also not used for stencil
image. They still need to be fixed.
In find_value() check if we've hit the 0th/invalid entry before checking
if the pname matches.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=31987
NOTE: This is a candidate for the 7.9 branch.
If querying the default/window-system FBO's attachment type, return
GL_FRAMEBUFFER_DEFAULT (per the GL_ARB_framebuffer_object spec).
See http://bugs.freedesktop.org/show_bug.cgi?id=31947
NOTE: This is a candidate for the 7.9 branch.
Return 0 instead of generating an error.
See http://bugs.freedesktop.org/show_bug.cgi?id=30993
Note that piglit fbo-getframebufferattachmentparameter-01 still does
not pass. But Mesa behaves the same as the NVIDIA driver in this case.
Perhaps the test is incorrect.
NOTE: This is a candidate for the 7.9 branch.
It was done in path-to-polygon conversion. That meant that the
results were invalidated when the transformation was modified, and CPU
had to recreate the vertex buffer with new vertices. It could be a
performance hit for apps that animate.
gl_FragCoord.y needs to be flipped upside down if a FBO is bound.
This fixes:
- piglit/fbo-fragcoord
- https://bugs.freedesktop.org/show_bug.cgi?id=29420
Here I add a new program state STATE_FB_WPOS_Y_TRANSFORM, which is set based
on whether a FBO is bound. The state contains a pair of transformations.
It can be either (XY=identity, ZW=transformY) if a FBO is bound,
or (XY=transformY, ZW=identity) otherwise, where identity = (1, 0),
transformY = (-1, height-1).
A classic driver (or st/mesa) may, based on some other state, choose whether
to use XY or ZW, thus negate the conditional "if (is a FBO bound) ...".
The reason for this is that a Gallium driver is allowed to only support WPOS
relative to either the lower left or the upper left corner, so we must flip
the Y axis accordingly again. (the "invert" parameter in emit_wpos_inversion)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fixes nasty bug where some parts of the code didn't define WIN32_THREADS
and were using the integer mutex implementation, causing even confusion
to the debuggers.
And there is little interest of other thread implemenation on Win32
besides Win32 threads.
This allows 16K x 16K 2D textures, for example, but we don't want to
allow that for 3D textures. The new gl_constants::MaxTextureMBytes
field is used to prevent allocating too large of texture image.
This allows a 16K x 32 x 32 3D texture, for example, but prevents 16K^3.
Drivers can override this limit. The default is currently 1GB.
Apps should use the proxy texture mechanism to determine the actual
max texture size.
resources have a array_size parameter now.
get_tex_surface and tex_surface_destroy have been renamed to create_surface
and surface_destroy and moved to context, similar to sampler views (and
create_surface now uses a template just like create_sampler_view). Surfaces
now really should only be used for rendering. In particular they shouldn't be
used as some kind of 2d abstraction for sharing a texture. offset/layout fields
don't make sense any longer and have been removed, width/height should go too.
surfaces and sampler views now specify a layer range (for texture resources),
layer is either array slice, depth slice or cube face.
pipe_subresource is gone array slices (or cube faces) are now treated the same
as depth slices in transfers etc. (that is, they use the z coord of the
respective functions).
Squashed commit of the following:
commit a45bd509014743d21a532194d7b658a1aeb00cb7
Merge: 1aeca28 32e1e59
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 04:32:06 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_resource_texture.c
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/drivers/i915/i915_surface.c
commit 1aeca287a827f29206078fa1204715a477072c08
Merge: 912f042 6f7c8c3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 00:37:11 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/state_trackers/vega/api_filters.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/mask.c
src/gallium/state_trackers/vega/paint.c
src/gallium/state_trackers/vega/renderer.c
src/gallium/state_trackers/vega/st_inlines.h
src/gallium/state_trackers/vega/vg_context.c
src/gallium/state_trackers/vega/vg_manager.c
commit 912f042e1d439de17b36be9a740358c876fcd144
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 03:01:55 2010 +0100
gallium: even more compile fixes after merge
commit 6fc95a58866d2a291def333608ba9c10c3f07e82
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 00:22:26 2010 +0100
gallium: some fixes after merge
commit a8d5ffaeb5397ffaa12fb422e4e7efdf0494c3e2
Merge: f7a202f 2da02e7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 30 23:41:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/vg_context.c
commit f7a202fde2aea2ec78ef58830f945a5e214e56ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 19:19:32 2010 +0100
gallium: even more fixes/cleanups after merge
commit 6895a7f969ed7f9fa8ceb788810df8dbcf04c4c9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 03:07:36 2010 +0100
gallium: more compile fixes after merge
commit af0501a5103b9756bc4d79167bd81051ad6e8670
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 19:24:45 2010 +0100
gallium: lots of compile fixes after merge
commit 0332003c2feb60f2a20e9a40368180c4ecd33e6b
Merge: 26c6346 b6b91fa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 17:02:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_sample.c
src/gallium/auxiliary/util/u_blit.c
src/gallium/auxiliary/util/u_blitter.c
src/gallium/auxiliary/util/u_inlines.h
src/gallium/auxiliary/util/u_surface.c
src/gallium/auxiliary/util/u_surfaces.c
src/gallium/docs/source/context.rst
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/nv50/nv50_state_validate.c
src/gallium/drivers/nvfx/nv04_surface_2d.c
src/gallium/drivers/nvfx/nv04_surface_2d.h
src/gallium/drivers/nvfx/nvfx_buffer.c
src/gallium/drivers/nvfx/nvfx_miptree.c
src/gallium/drivers/nvfx/nvfx_resource.c
src/gallium/drivers/nvfx/nvfx_resource.h
src/gallium/drivers/nvfx/nvfx_state_fb.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_buffer.c
src/gallium/drivers/r600/r600_context.h
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/include/pipe/p_defines.h
src/gallium/state_trackers/egl/common/egl_g3d_api.c
src/gallium/state_trackers/glx/xlib/xm_st.c
src/gallium/targets/libgl-gdi/gdi_softpipe_winsys.c
src/gallium/targets/libgl-gdi/libgl_gdi.c
src/gallium/tests/graw/tri.c
src/mesa/state_tracker/st_cb_blit.c
src/mesa/state_tracker/st_cb_readpixels.c
commit 26c6346b385929fba94775f33838d0cceaaf1127
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:37:21 2010 +0200
fix more merge breakage
commit b30d87c6025eefe7f6979ffa8e369bbe755d5c1d
Merge: 9461bf3 1f1928d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:15:38 2010 +0200
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_rast_priv.h
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_screen_buffer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_texture.h
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/drivers/r600/r600_texture.h
src/gallium/state_trackers/dri/common/dri1_helper.c
src/gallium/state_trackers/dri/sw/drisw.c
src/gallium/state_trackers/xorg/xorg_exa.c
commit 9461bf3cfb647d2301364ae29fc3084fff52862a
Merge: 17492d7 0eaccb3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 15 20:13:45 2010 +0200
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_blitter.c
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_surface.c
src/gallium/drivers/r300/r300_render.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/tests/trivial/quad-tex.c
commit 17492d705e7b7f607b71db045c3bf344cb6842b3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Jun 18 10:58:08 2010 +0100
gallium: rename element_offset/width fields in views to first/last_element
This is much more consistent with the other fields used there
(first/last level, first/last layer).
Actually thinking about removing the ugly union/structs again and
rename first/last_layer to something even more generic which could also
be used for buffers (like first/last_member) without inducing headaches.
commit 1b717a289299f942de834dcccafbab91361e20ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 14:46:09 2010 +0100
gallium: remove PIPE_SURFACE_LAYOUT_LINEAR definition
This was only used by the layout field of pipe_surface, but this
driver internal stuff is gone so there's no need for this driver independent
layout definition neither.
commit 10cb644b31b3ef47e6c7b55e514ad24bb891fac4
Merge: 5691db9 c85971d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 12:20:41 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/docs/source/glossary.rst
src/gallium/tests/graw/fs-test.c
src/gallium/tests/graw/gs-test.c
commit 5691db960ca3d525ce7d6c32d9c7a28f5e907f3b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 11:29:03 2010 +0100
st/wgl: fix interface changes bugs
commit 2303ec32143d363b46e59e4b7c91b0ebd34a16b2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:32 2010 +0100
gallium: adapt code to interface changes...
commit dcae4f586f0d0885b72674a355e5d56d47afe77d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:05 2010 +0100
gallium: separate depth0 and array_size in the resource itself.
These fields are still mutually exclusive (since no 3d array textures exist)
but it ultimately seemed to error-prone to adapt all code accept the new
meaning of depth0 (drivers stick that into hardware regs, calculate mipmap
sizes etc.). And it isn't really cleaner anyway.
So, array textures will have depth0 of 1, but instead use array_size,
3D textures will continue to use depth0 (and have array_size of 1). Cube
maps also will use array_size to indicate their 6 faces, but since all drivers
should just be fine by inferring this themselves from the fact it's a cube map
as they always used to nothing should break.
commit 621737a638d187d208712250fc19a91978fdea6b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:47:38 2010 +0100
gallium: adapt code to interface changes
There are still usages of pipe_surface where pipe_resource should be used,
which should eventually be fixed.
commit 2d17f5efe166b2c3d51957c76294165ab30b8ae2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:46:14 2010 +0100
gallium: more interface changes
In particular to enable usage of buffers in views, and ability to use a
different pipe_format in pipe_surface.
Get rid of layout and offset parameter in pipe_surface - the former was
not used in any (public) code anyway, and the latter should either be computed
on-demand or driver can use subclass of pipe_surface.
Also make create_surface() use a template to be more consistent with
other functions.
commit 71f885ee16aa5cf2742c44bfaf0dc5b8734b9901
Merge: 3232d11 8ad410d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 14:19:51 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_box.h
src/gallium/drivers/nv50/nv50_surface.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/include/pipe/p_state.h
commit 3232d11fe3ebf7686286013c357b404714853984
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:40:04 2010 +0100
mesa/st: adapt to interface changes
still need to fix pipe_surface sharing
(as that is now per-context).
Also broken is depth0 handling - half the code assumes
this is also used for array textures (and hence by extension
of that cube maps would have depth 6), half the code does not...
commit f433b7f7f552720e5eade0b4078db94590ee85e1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:35:52 2010 +0100
gallium: fix a couple of bugs in interface chnage fixes
commit 818366b28ea18f514dc791646248ce6f08d9bbcf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:42:11 2010 +0200
targets: adapt to interface changes
Yes even that needs adjustments...
commit 66c511ab1682c9918e0200902039247793acb41e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:41:13 2010 +0200
tests: adapt to interface changes
Everything needs to be fixed :-(.
commit 6b494635d9dbdaa7605bc87b1ebf682b138c5808
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:39:50 2010 +0200
st: adapt non-rendering state trackers to interface changes
might not be quite right in all places, but they really don't want
to use pipe_surface.
commit 00c4289a35d86e4fe85919ec32aa9f5ffe69d16d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:38:48 2010 +0200
winsys: adapt to interface changes
commit 39d858554dc9ed5dbc795626fec3ef9deae552a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:54 2010 +0200
st/python: adapt to interface changes
don't think that will work, sorry.
commit 6e9336bc49b32139cec4e683857d0958000e15e3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:07 2010 +0200
st/vega: adapt to interface changes
commit e07f2ae9aaf8842757d5d50865f76f8276245e11
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:25:56 2010 +0200
st/xorg: adapt to interface changes
commit 05531c10a74a4358103e30d3b38a5eceb25c947f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:53 2010 +0200
nv50: adapt to interface changes
commit 97704f388d7042121c6d496ba8c003afa3ea2bf3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:45 2010 +0200
nvfx: adapt to interface changes
commit a8a9c93d703af6e8f5c12e1cea9ec665add1abe0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:01 2010 +0200
i965g: adapt to interface changes
commit 0dde209589872d20cc34ed0b237e3ed7ae0e2de3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:22:38 2010 +0200
i915g: adapt to interface changes
commit 5cac9beede69d12f5807ee1a247a4c864652799e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:58 2010 +0200
svga: adapt to interface changes
resource_copy_region still looking fishy.
Was not very suited to unified zslice/face approach...
commit 08b5a6af4b963a3e4c75fc336bf6c0772dce5150
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:01 2010 +0200
rbug: adapt to interface changes
Not sure if that won't need changes elsewhere?
commit c9fd24b1f586bcef2e0a6e76b68e40fca3408964
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:31 2010 +0200
trace: adapt to interface changes
commit ed84e010afc5635a1a47390b32247a266f65b8d1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:21 2010 +0200
failover: adapt to interface changes
commit a1d4b4a293da933276908e3393435ec4b43cf201
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:12 2010 +0200
identity: adapt to interface changes
commit a8dd73e2c56c7d95ffcf174408f38f4f35fd2f4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:55 2010 +0200
softpipe: adapt to interface changes
commit a886085893e461e8473978e8206ec2312b7077ff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:44 2010 +0200
llvmpipe: adapt to interface changes
commit 70523f6d567d8b7cfda682157556370fd3c43460
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:14 2010 +0200
r600g: adapt to interface changes
commit 3f4bc72bd80994865eb9f6b8dfd11e2b97060d19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:05 2010 +0200
r300g: adapt to interface changes
commit 5d353b55ee14db0ac0515b5a3cf9389430832c19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:17:37 2010 +0200
cell: adapt to interface changes
not even compile tested
commit cf5d03601322c2dcb12d7a9c2f1745e2b2a35eb4
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:14:59 2010 +0200
util: adapt to interface changes
amazing how much code changes just due to some subtle interface changes?
commit dc98d713c6937c0e177fc2caf23020402cc7ea7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:12:40 2010 +0200
gallium: more interface fail, docs
this also changes flush_frontbuffer to use a pipe_resource instead of
a pipe_surface - pipe_surface is not meant to be (or at least no longer)
an abstraction for standalone 2d images which get passed around.
(This has also implications for the non-rendering state-trackers.)
commit 08436d27ddd59857c22827c609b692aa0c407b7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 10 17:42:52 2010 +0200
gallium: fix array texture interface changes bugs, docs
commit 4a4d927609b62b4d7fb9dffa35158afe282f277b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 3 22:02:44 2010 +0200
gallium: interface changes for array textures and related cleanups
This patch introduces array textures to gallium (note they are not immediately
usable without the associated changes to the shader side).
Also, this abandons pipe_subresource in favor of using level and layer
parameters since the distinction between several faces (which was part of
pipe_subresource for cube textures) and several z slices (which were not part
of pipe_subresource but instead part of pipe_box where appropriate for 3d
textures) is gone at the resource level.
Textures, be it array, cube, or 3d, now use a "unified" set of parameters,
there is no distinction between array members, cube faces, or 3d zslices.
This is unlike d3d10, whose subresource index includes layer information for
array textures, but which considers all z slices of a 3d texture to be part
of the same subresource.
In contrast to d3d10, OpenGL though reuses old 2d and 3d function entry points
for 1d and 2d array textures, respectively, which also implies that for instance
it is possible to specify all layers of a 2d array texture at once (note that
this is not possible for cube maps, which use the 2d entry points, although
it is possible for cube map arrays, which aren't supported yet in gallium).
This should possibly make drivers a bit simpler, and also get rid of mutually
exclusive parameters in some functions (as z and face were exclusive), one
potential downside would be that 3d array textures could not easily be supported
without reverting this, but those are nowhere to be seen.
Also along with adjusting to new parameters, rename get_tex_surface /
tex_surface_destroy to create_surface / surface_destroy and move them from
screen to context, which reflects much better what those do (they are analogous
to create_sampler_view / sampler_view_destroy).
PIPE_CAP_ARRAY_TEXTURES is used to indicate if a driver supports all of this
functionality (that is, both sampling from array texture as well as use a range
of layers as a render target, with selecting the layer from the geometry shader).
Malloc likes to reuse old address as soon as possible this would cause the
new vbo buffer to get the same address as the old. So make sure we set it to
NULL when we allocate a new one. This fixes ipers which will fill up a couple
of VBO buffers per frame.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Byte offsets simply don't work with tiled render targets when using
tiling bits. Luckily we can cox the hw into doing the right thing
with the DRAWING_RECT command by disabling the drawing rect offset
for the depth buffer.
Minor fixes by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Tiling is rather fragile in general and results in pure blackness when
unlucky. Hence add a new option to disable tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
libdrm-intel can refuse to tile buffers for various reasons. For
potentially tiled buffers the stride is therefore only known after
the iws->buffer_create_tiled call. Unconditionally rounding up to whatever
tiling requires wastes space, so rework the code to not use tex->stride
in the layout code.
Luckily only the mimap/face offset calculation uses it which can easily
be solved by storing an (x, y) coordinate pair. Furthermore this will
be usefull later for properly supporting rendering into the different
levels of tiled mipmap textures.
v2: switch to nblocks(x|y): More in line with gallium and better
suited for rendering into mipmap textures.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This is needed to properly implement tiling flags. And the gem
implemention fo buffer_from_handle already calls get_tiling, so
it's for free.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This way relaxed fencing is handled by libdrm. And buffers _can't_
ever change their tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Different kernels have different restrictions for tiled buffers.
Hence use the libdrm abstraction to calculate the necessary
stride and height alignment requirements.
Not yet used.
v2: Incorporate review comments from Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It's unnecessary. The kernel gem ignores it totally and we can't
run on the old userspace fake bo manager due to lack of dri2.
Also drop the redundant name string from the sw winsys as suggested
by Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
By not doing so, the uniform contents of
glsl-uniform-non-uniform-array-compare.shader_test was getting thrown
out since nobody was recorded as dereferencing the array.
The new lower_discard and opt_discard_simplification passes should
handle all the necessary transformations, so lower_jumps doesn't need to
support it.
Also, lower_jumps incorrectly handled conditional discards - it would
unconditionally truncate all code after the discard. Rather than fixing
the bug, simply remove the code.
NOTE: This is a candidate for the 7.9 branch.
This should allow lower_if_to_cond_assign to work in the presence of
discards, fixing bug #31690 and likely #31983.
NOTE: This is a candidate for the 7.9 branch.
There were a few places where we were using the wrong opcodes
on evergreen. arl still needs to be fixed on evergreen; see
r600g for reference.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
As the blend texture, a drawing surface mask is used when masking is
enabled. It should be created as needed.
s/alpha_mask/surface_mask/ to follow OpenVG 1.1 naming.
Depending on whether vgDrawPath(mode), vgDrawImage, or vgDrawGlyph[s] is
called, different paint-to-user and user-to-surface matrices should be
used to derive the sample points for the paint.
This fixes "paint" demo.
Divide bits of VegaShaderType into 6 groups: paint, image, mask, fill,
premultiply, and bw. Each group represents a stage. At most one shader
from each group will be selected when constructing the final fragment
shader.
Optional features such as auth-hinting are not implemented. There is no
anti-aliasing, and no effort is done to keep the glyph origin integral.
So the text quality is poor.
With this commit, the pipe states are entirely managed by the renderer.
The rest of the code interfaces with the renderer instead of
manipulating the states directly.
vg_manager_validate_framebuffer should mark the fb dirty and have
vg_validate_state call cso_set_framebuffer. Rename VIEWPORT_DIRTY to
FRAMEBUFFER_DIRTY.
The states are designated for polygon filling. Polygon filling is a
two-pass process utilizing the stencil buffer. polygon_fill and
polygon_array_fill functions are updated to make use of the state.
This state provides the ability to clear rectangles of the framebuffer
to the specified color, honoring scissoring. vegaClear is updated to
make use of the state.
This state provides glDrawTex-like function. It can be used for
vgSetPixels.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_surface to use DRAWTEX or COPY state internally
depending on whether the destination is the framebuffer.
Renderer states are high-level states to perform specific tasks. The
renderer is initially in INIT state. In that state, the renderer is
used for OpenVG pipeline.
This commit adds a new COPY state to the renderer. The state is used
for copying between two pipe resources using textured drawing. It can
be used for vgCopyImage, for example.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_texture to use the COPY state internally.
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc. All data structures built with
this object can be periodically freed during a "garbage collection"
operation.
The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
src/gallium/drivers/llvmpipe/lp_state_setup.c
I made the texwrap test be more thorough and realized that this driver code
had not been quite right. This commit fixes the border color for depth
textures, compressed textures, and 16-bits-per-channel textures
with up to 2 channels (R16, RG16).
NOTE: This is a candidate for the 7.9 branch.
Previously, IR for a linked shader was allocated directly out of the
gl_shader object - meaning all of it lived as long as the shader.
Now, IR is allocated out of a temporary context, and any -live- IR is
reparented/stolen to (effectively) the gl_shader. Any remaining IR can
be freed.
NOTE: This is a candidate for the 7.9 branch.
This makes a very simple 1.30 shader go from 196k of memory to 9k.
NOTE: This -may- be a candidate for the 7.9 branch, as the benefit is
substantial. However, it's not a simple change, so it may be wiser to
wait for 7.10.
We were trying to emit a single ir_expression to compare the whole
thing. The backends (ir_to_mesa.cpp and brw_fs.cpp so far) expected
ir_binop_any_nequal or ir_binop_all_equal to apply to at most a vector
(with matrices broken down by the lowering pass). Break them down to
a bunch of ORed or ANDed any_nequals/all_equals.
Fixes:
glsl-array-compare
glsl-array-compare-02
glsl-fs-struct-equal
glsl-fs-struct-notequal
Bug #31909
This doesn't cover all expressions or all operand types, but it will
complain if you overreach and it allows for much greater slack on the
programmer's part.
This code was for the old GLcore build of the software rasteriser. The
X server switched to a DRI driver for software indirect GLX long ago.
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are also some u_simple_shaders changes.
On r300, the TGSI_SEMANTIC_COLOR varying is a fixed-point number clamped
to the range [0,1] and limited to 12 bits of precision. Therefore we can't
use it for passing through a clear color in order to clear high precision
texture formats.
This also makes u_blitter use only one vertex shader instead of two.
Since the viewport is not updated on RandR12 mode switches anymore,
clipping to viewport may incorrectly clip away the video.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This may help paint the colorkey before overlay updates in some
situations where the app paints the color key (mainly xine).
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The compiler seriously needs a cleanup as far as the arrangement of functions
is concerned. It's hard to know whether some function was implemented or not
because there are so many places to search in and it can be anywhere and
named anyhow.
If nvfx_framebuffer prepare and validate were called successively with
fb->zsbuf not NULL and then NULL, nvfx->hw_zeta would contain garbage and
this would cause failures in nvfx_framebuffer_relocate/OUT_RELOC(hw_zeta).
This was triggered by piglit/texwrap 2D GL_DEPTH_COMPONENT24 and caused
first a 'write to user buffer!!' error in libdrm and then worse things.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Currently, the standalone compiler tries to do function inlining before
linking shaders (including linking against the built-in functions).
This resulted in the built-in function _prototypes_ being inlined rather
than the actual function definition.
This is only known to fix a bug in the standalone compiler; most
programs should be unaffected. Still, it seems like a good idea.
NOTE: This is a candidate for the 7.9 branch.
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:222: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:221: warning: ISO C90 forbids mixed declarations and code
The alpha mask is addressed with unnormalized coordinates in the
fragment shader. It should have the same orientation as the surface
does.
This fixes "mask" OpenVG demo.
Previously, the get table listed all three as having custom locations,
yet find_custom_value did not have cases to handle them.
MAX_VARYING_VECTORS does not need a custom location since MaxVaryings is
already stored as float[4] (or vec4). MaxUniformComponents is stored as
the number of floats, however, so a custom implementation that divides
by 4 is necessary.
Fixes bugs.freedesktop.org #31495.
This fixes incorrect behaviour when the stencil clear value exceeds
the size of the stencil buffer, for example, when set with:
glClearStencil (~1); /* Set a bit pattern of 111...11111110 */
glClear (GL_STENCIL_BUFFER_BIT);
The clear value needs to be masked by the value 2^m - 1, where m is the
number of bits in the stencil buffer. Previously, we passed the value
masked with 0x7fffffff to _mesa_StencilFuncSeparate which then clamps,
NOT masks the value to the range 0 to 2^m - 1.
The result would be clearing the stencil buffer to 0xff, rather than 0xfe.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure regions are properly updated and that the colorkey painting is
flushed before we update the HW overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is needed to properly sync with host side rendering. For example,
make sure we flush colorkey painting before updating the overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When a context was made current to another surface, the old code did
this
dri2_dpy->core->bindContext(cctx, ddraw, rdraw);
dri2_dpy->core->unbindContext(old_cctx);
and there will be no current context due to the second line.
unbindContext should be called only when bindContext is not. This fixes
a regression since d19afc57. Thanks to Neil Roberts for noticing the
issue and creating a test case.
(Marek: added the initializion of "vec" in the default statement)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This fixes piglit/glsl-vs-main-return and glsl-fs-main-return for the drivers
which don't support RET (i915g, r300g, r600g, svga).
ir_to_mesa does not currently generate subroutines, but it's a matter of time
till it's added. It would then break all the drivers which don't implement
them, so this CAP makes sense.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
When the result of the alpha instruction is being replicated to the RGB
destination register, we do not need to use alpha's destination register.
This fixes an invalid "Too many hardware temporaries used" error in
the case where a transcendent operation writes to a temporary register
greater than max_temp_regs.
NOTE: This is a candidate for the 7.9 branch.
This fixes an invalid "Too many hardware temporaries used" error in the
case where a source reads from a temporary register with an index greater
than max_temp_regs and then the source is marked as unused before the
register allocation pass.
NOTE: This is a candidate for the 7.9 branch.
Reads of registers that where not written to within the same block were
not being tracked. So in a situations like this:
0: IF
1: ADD t0, t1, t2
2: MOV t2, t1
Instruction 2 didn't know that instruction 1 read from t2, so
in some cases instruction 2 was being scheduled before instruction 1.
NOTE: This is a candidate for the 7.9 branch.
Gallium drivers pass all piglit tests for the two (there are 12 tests
for separate_shader_objects and 5 tests for explicit_attrib_location),
and I was told the extensions don't need any driver-specific code.
I made them dependent on PIPE_CAP_GLSL.
Signed-off-by: Brian Paul <brianp@vmware.com>
The drm winsys only ever handles one gem memory manager. Rip out
the unnecessary complication.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Not using the gtt is considered harmful for performance. And for
partial uploads there's always drm_intel_bo_subdata.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It looks like this was meant to facilitate unfenced access to textures/
color/renderbuffers. It's totally incomplete and fundamentally broken
on a few levels:
- broken: The kernel needs to about every tiled bo to fix up bit17
swizzling on swap-in.
- unflexible: fenced/unfenced relocs from execbuffer2 do the same, much
simpler.
- unneeded: with relaxed fencing tiled gem bos are as memory-efficient
as this trick.
Hence kill it.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Fix a crash when the subrectangle is not inside the fb. Fix wrong
pipe transfer when sx > 0 or sy + height != fb->height.
This fixes "readpixels" demo.
These two samplers use non-normalized texture coordinates. wrap_r
cannot be PIPE_TEX_WRAP_REPEAT (the default).
This fixes
sp_tex_sample.c:1790:get_linear_unorm_wrap: Assertion `0' failed
assertion failure.
Fix OpenVG "filter" demo
Program received signal SIGSEGV, Segmentation fault.
0xb7153dc9 in str_match_no_case (pcur=0xbfffe564, str=0x0) at
tgsi/tgsi_text.c:86
86 while (*str != '\0' && *str == uprcase( *cur )) {
This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.
No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.
Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
Hardware pretty commonly has saturate modifiers on instructions, and
this can be used in codegen to produce those, without everyone else
needing to understand clamping other than min and max.
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.
These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
Use fetch shader instead of having fetch instruction in the vertex
shader. Allow to restrict shader update to a smaller part when
vertex buffer input layout changes.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Occlusion query on evergreen need the event index field to be
set otherwise we endup locking up the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Condiation moves with a condition of (a < 0), (a > 0), (a <= 0), or (a
>= 0) can be generated with "a" directly as an operand of the CMP
instruction. This doesn't help much now, but it will help with
assembly shaders that use the CMP instruction.
This should prevent the field going unset in the future. See bug
http://bugs.freedesktop.org/show_bug.cgi?id=31544 for background.
Also remove unneeded calls to clear_teximage_fields().
Finally, call _mesa_set_fetch_functions() from the
_mesa_init_teximage_fields() function so callers have one less
thing to worry about.
Fix this GCC warning.
ir.cpp: In static member function
'static unsigned int ir_expression::get_num_operands(ir_expression_operation)':
ir.cpp:199: warning: control reaches end of non-void function
If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.
This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.
It's annoying to use test suites under a Mesa debug build because
pretty output is cluttered with stderr's continuous reports that
you're still using the debug driver.
The new usage message lists possible command line options. (Newcomers to Mesa
currently have to trawl through the source to find the command line options,
and we should save them from that trouble.)
Example Output
--------------
usage: ./glsl_compiler [options] <file.vert | file.geom | file.frag>
Possible options are:
--glsl-es
--dump-ast
--dump-hir
--dump-lir
--link
This adds sentinel values to the ir_expression_operation enum type:
ir_last_unop, ir_last_binop, and ir_last_opcode. They are set to the
previous one so they don't trigger "unhandled case in switch statement"
warnings, but should never be handled directly.
This allows us to remove the huge array of 1s and 2s in
ir_expression::get_num_operands().
We are not aware of any GPU that actually implements the cross product
as a single instruction. Hence, there's no need for it to be an opcode.
Future commits will remove it entirely.
This is really supposed to be defined only if the driver supports highp
in the fragment shader - but all of our current ES2 implementations do.
So, just define it. In the future, we'll need to add a flag to
gl_context and only define the macro if the flag is set.
"Fixes" freedesktop.org bug #31673.
ir_binop_less, ir_binop_greater, ir_binop_lequal, and ir_binop_gequal
are defined to work on vectors as well as scalars, as long as the two
operands have the same type.
This is evident from both ir_validate.cpp and our use of these opcodes
in the GLSL lessThan, greaterThan, lessThanEqual, greaterThanEqual
built-in functions.
Found by code inspection. Not known to fix any bugs. Presumably, our
tests for the built-in comparison functions must pass because C.E.
handling is done on the ir_call of "greaterThan" rather than the inlined
opcode. The C.E. handling of the built-in function calls is correct.
NOTE: This is a candidate for the 7.9 branch.
Default group bytes to 512 on evergreen. Don't query
tiling config yet for evergreen, the current info returned is not
adequate for evergreen (no way to get bank info).
RET was interpreted as END, which was wrong. Instead, if a shader contains RET
in the main function, it will fail to compile with an error message
from now on.
The hack is from early days.
When drawing GL_DEPTH_COMPONENT the usual fragment pipeline steps apply
so don't override the depth state.
When drawing GL_STENCIL_INDEX (or GL_DEPTH_STENCIL) the fragment pipeline
does not apply (only the stencil and Z writemasks apply) so disable writes
to the color buffers.
Fixes some regressions from commit ef8bb7ada9
This consolidates the TOKEN_OR_IDENTIFIER and RESERVED_WORD macros into
a single KEYWORD macro.
The old TOKEN_OR_IDENTIFIER macros handled the case of a word going from
an identifier to a keyword; the RESERVED_WORD macro handled a word going
from a reserved word to a language keyword. However, neither could
properly handle samplerBuffer (for example), which is an identifier in
1.10 and 1.20, a reserved word in 1.30, and a keyword in 1.40 and on.
Furthermore, the existing macros didn't properly handle reserved words
in GLSL ES 1.00. The best they could do was return a token (rather than
an identifier), resulting in an obtuse parser error, rather than a
user-friendly "you used a reserved word" error message.
Functions are not first class objects in GLSL, so there is never a value
of function type. No code actually used this except for one function
which asserted it shouldn't occur. One comment mentioned it, but was
incorrect. So we may as well remove it entirely.
This driver is a fake swdri driver that perform no operations
beside allocation gallium structure and buffer for upper layer
usage.
It's purpose is to help profiling core mesa/gallium without
having pipe driver overhead hidding hot spot of core code.
scons file are likely inadequate i am unfamiliar with this
build system.
To use it simply rename is to swrast_dri.so and properly set
LIBGL_DRIVERS_PATH env variable.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
src/mesa/drivers/dri/*/*/*.[chS] is a superset of
src/mesa/drivers/dri/*/server/*.[ch] and
src/mesa/drivers/dri/common/xmlpool/*.[ch].
include/GL/internal/glcore.h is already in MAIN_FILES, no need for it in
DRI_FILES too. src/glx/Makefile was listed twice.
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This showed up as cairo-gl gradients being inverted on everyone but
Intel, where I'd apparently tweaked the transformation to work around
the bug. Fixes piglit fbo-fragcoord.
Required because ATI and NVIDIA DX9 GPUs do not support indirect addressing
of temps, inputs, outputs, and consts (FS-only) or the hw support is so
limited that we cannot use it.
This should make r300g and possibly nvfx more feature complete.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Since this was talloced off of NULL instead of the compile state, it
was a real leak over the course of the program. Noticed with
valgrind --leak-check=full --show-reachable=yes. We should really
change these passes to generally get the compile context as an argument
so simple mistakes like this stop mattering.
This fixes a regression (failed assertion) from commit
c552f273f5 which was hit if glDeleteBuffers()
was called on a buffer that was never bound.
NOTE: this is a candidate for the 7.9 branch.
Currently r600_resource_copy_region() will turn these copies into
transfers + memcpys, so to avoid recursion we must not turn those
transfers back into blits.
Previously queries of MAX_SAMPLES were only allowed with
ARB_framebuffer_object, but EXT_framebuffer_multisample also enables
this query. This seems to only effect the i915. All other drivers
support both extensions or neither extension.
This patch is based on a patch that Kenneth sent along with the report.
NOTE: this is a candidate for the 7.9 branch.
Reported-by: Kenneth Waters <kwaters@chromium.org>
For driver performance analysis it usefull to be able to
disable as much as possible the GPU interaction so that
one can profile the userspace only.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
IF statements were getting flattened while they were broken. With
Zhenyu's last fix for ENDIF's type, everything appears to have lined
up to actually work.
This regresses two tests:
glsl1-! (not) operator (1, fail)
glsl1-! (not) operator (1, pass)
but fixes tests that couldn't work before because the IFs couldn't be
flattened:
glsl-fs-discard-01
occlusion-query-discard
(and, naturally, this should be a performance improvement for apps
that actually use IF statements to avoid executing a bunch of code).
Sometimes we swizzled in a different channel it looked like, and
sometimes we swizzled in zero. Or something.
Having looked at the output of another code generator for this chip,
this is approximately what they do, too: use align1 math on
temporaries, and then move the results into place.
Fixes:
glean/vp1-EX2 test
glean/vp1-EXP test
glean/vp1-LG2 test
glean/vp1-RCP test (reciprocal)
glean/vp1-RSQ test 1 (reciprocal square root)
shaders/glsl-cos
shaders/glsl-sin
shaders/glsl-vs-masked-cos
shaders/vpfp-generic/vp-exp-alias
Instead of messing with the callers simply copy our inputs into a
alloca array at the beginning of the function and then use it.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
DD_POINT_SIZE was broken for quite some time, and the only driver (r200) relying
on this no longer needs it.
Both DD_POINT_SIZE and DD_LINE_WIDTH have no users left outside of debugging
output, hence instead of fixing DD_POINT_SIZE setting just drop both of them -
there was a plan to remove tricaps flags entirely at some point.
DD_POINT_SIZE got never set for some time now (as it was set only in ifdefed
out code), which caused the r200 driver to use the point primitive mistakenly
in some cases which can only do size 1 instead of point sprite. Since the
logic to use point instead of point sprite prim is flaky at best anyway (can't
work correctly for per-vertex point size), just drop this and always emit point
sprites (except for AA points) - reasons why the driver tried to use points for
size 1.0 are unknown though it is possible they are faster or more conformant.
Note that we can't emit point sprites without point sprite cntl as that might
result in undefined point sizes, hence need drm version check (which was
unnecessary before as it should always have selected points). An
alternative would be to rely on the RE point size clamp controls which could
clamp the size to 1.0 min/max even if the SE point size is undefined, but currently
always use 0 for min clamp. (As a side note, this also means the driver does
not honor the gl spec which mandates points, but not point sprites, with zero size
to be rendered as size 1.)
This should fix recent reports of https://bugs.freedesktop.org/show_bug.cgi?id=702.
This is a candidate for the mesa 7.9 branch.
Fixes assertion failure with texture swizzling in the GLSL path when
it's triggered (such as gen6 FF or ARB_fp shadow comparisons).
Fixes:
texdepth
texSwizzle
fp1-DST test
fp1-LIT test 3
Also fixup code comment to reflect that the GPU requires DWORD
alignment, but in this case does not actually pass the value "in
DWORDs" as I previously stated.
This reverts commit 76360d6abc. On
second thought, it turned out that sync objects also used the
wait_rendering API like this, and would need the same treatment, and
so wait_rendering itself is fixed in libdrm now.
We were asking for a wait to GTT read (all GPU rendering to it
complete), instead of asking for all GPU reading from it to be
complete. Prevents swapbuffers-based apps from running away with
rendering, and produces a better input experience.
For some reason I though we needed the _DISCARD flag to avoid
readbacks, which isn't true at all. Now write operations should
pipeline properly, gives a good speedup to demos/tunnel.
When linked with certain builds of libstdc++, it appears like powf is resolved
by a symbol in that library. Other builds of libstdc++ doesn't contain that
symbol resulting in a linker / loader error. Optionally
resolve that symbol and replace it with calls to logf and expf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes this GCC warning.
graw.h:93: warning: 'struct pipe_surface' declared inside parameter list
graw.h:93: warning: its scope is only this definition or declaration,
which is probably not what you want
A call to radeon_prepare_render() at the beginning of draw
operations was placed too deep in the call chain,
inside r300RunRenderPrimitive(), instead of
r300DrawPrims() where it belongs. This leads to
emission of stale target color renderbuffer into the cs if
bufferswaps via page-flipping are used, and thereby causes
massive rendering corruption due to unsynchronized
rendering into the active frontbuffer.
This patch fixes such problems for use with the
upcoming radeon page-flipping patches.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
The width of the 2D blits used to copy the data is defined as a 16-bit
signed integer, but the pitch must be DWORD aligned. Limit to an integral
number of DWORDs, (1 << 15 - 4) rather than (1 << 15 -1).
Fixes corruption to data uploaded with glBufferSubData.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Allows applications to dump surfaces to file without
referencing gallium/auxiliary entry points statically.
Existing test apps have been modified such that
they save the contents of the fronbuffer only
when the `-o' option's specified.
These ES1 features were only tested for in the vertex array code.
Checking the ctx->API field at runtime is cleaner than the #ifdef
stuff and supports choosing the API at runtime.
Flushing the vertices after having already updated the state doesn't
do any good. Fixes useshaderprogram-flushverts-1. As a side effect,
by moving it to the right place we end up skipping no-op state changes
for traditional glUseProgram.
Rebuilding the vertex format from scratch every time we see a new
vertex attribute is rather costly, new attributes can be appended at
the end avoiding a copy to current and then back again, and the full
attr pointer recalculation.
In the not so likely case of an already existing attribute having its
size increased the old behavior is preserved, this could be optimized
more, not sure if it's worth it.
It's a modest improvement in FlightGear (that game punishes the VBO
module pretty hard in general, framerate goes from some 46 FPS to 50
FPS with the nouveau classic driver).
Signed-off-by: Brian Paul <brianp@vmware.com>
Silences this GCC warning.
brw_wm_fp.c: In function 'brw_wm_pass_fp':
brw_wm_fp.c:966: warning: 'last_inst' may be used uninitialized in this function
brw_wm_fp.c:966: note: 'last_inst' was declared here
Silences this GCC warning.
brw_wm_fp.c: In function 'precalc_tex':
brw_wm_fp.c:666: warning: 'tmpcoord.Index' may be used uninitialized in this function
Fixes this GCC warning with linux-x86 build.
radeon_dataflow.c: In function 'get_readers_normal_read_callback':
radeon_dataflow.c:472: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_schedule.c: In function 'merge_presub_sources':
radeon_pair_schedule.c:312: warning: ISO C90 forbids mixed declarations and code
Commit 8dfafbf086 forgot to update r300g.
There is a buf == NULL check, but buf is used before for var init.
Tested-by: Guillermo S. Romero <gsromero@infernal-iceberg.com>
Fixes this GCC warning.
nouveau_vbo_t.c: In function 'nv10_vbo_render_prims':
nouveau_render_t.c:161: warning: 'max_out' may be used uninitialized in this function
nouveau_render_t.c:161: note: 'max_out' was declared here
We really only want to print spaces -between- elements, not after each
element. This cleans up error messages from IR reader, making them
(mildly) easier to read.
In particular, calling the abs function is silly, since there's already
an expression opcode for that. Also, assigning to temporaries then
assigning those to the final location is rather redundant.
Trivial change that avoids a segmentation fault when the blitter state
happens to be bound when the context is destroyed.
The free calls should probably removed altogether in the future -- the
responsibility to destroy the state atoms lies with whoever created it,
and the safest thing for the pipe driver is to not touch any bound state
in its destructor.
In testing on Ironlake, the histogram of clocks/pixel results for the
system memcpy and magic unaligned memcpy show no noticeable difference
(and no statistically significant difference with the 5510 samples
taken, though the stddev is large due to what looks like the cache
effects from the different texture sizes used).
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
Discard fractional bits from linewidth. This matches the nvidia
closed drivers, my reading of the OpenGL SI and current llvmpipe
behaviour.
It looks a lot nicer & avoids ugliness where lines alternate between n
and n+1 pixels in width along their length.
Also fix up r600g to match.
These were previously being left in the default (D3D) mode. This mean
that triangles were drawn slightly incorrectly, but also because this
state is relied on by the u_blitter code, all blits were half a pixel
off.
Generalize the existing tiled_buffer path in texture transfers for use
in some non-tiled up and downloads.
Use a staging buffer, which the winsys will restrict to GTT memory.
GTT buffers have the major advantage when they are mapped, they are
cachable, which is a very nice property for downloads, usually the CPU
will want to do look at the data it downloaded.
This opens the question of what interface the winsys layer should
really have for talking about these concepts.
For now I'm using the existing gallium resource usage concept, but
there is no reason not use terms closer to what the hardware
understands - eg. the domains themselves.
The callback presents the given attachment to the native engine. It
allows the swap behavior and interval to be controlled. It will replace
native_surface::flush_frontbuffer and native_surface::swap_buffers
shortly.
Silences warning such as:
main/texobj.c:442:40: warning: ISO C99 requires rest arguments to be used
main/texobj.c:498:58: warning: ISO C99 requires rest arguments to be used
This ensures that we increase bo->map_count when radeon_bo_map_internal()
returns successfully, which in turn makes sure we don't decrement
bo->map_count below zero later.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
The call to draw_bind_fragment_shader() was using the old fragment
shader. This bug would have really only effected the draw module's
use of the fragment shader in the wide point stage.
Some C++ header files were included in an extern "C" block. When building with
Clang, this caused the build to fail due to namespace errors. (GCC did not
report any errors.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Important as more constant buffers per shader start to get used.
Fix up r600 (tested) and nv50 (untested) to cope with this. Drivers
previously didn't see unbinds of constant buffers often or ever, so
this isn't always dealt with cleanly.
For r600 just return and keep the reference. Will try to do better in
a followup change.
Don't trim triangle bounding box to scissor/draw-region until after
the logic for emitting tri_16. Don't generate tri_16 commands for
triangles with untrimmed bounding boxes outside the current tile.
This is important as the tri-16 itself can extend past tile bounds and
we don't want to add code to it to check against tile bounds (slow) or
restrict it to locations within a tile (pessimistic).
This effectively redoes 1741ddb747 in a
way that allows contexts of different APIs to coexist.
First, the changes to the remap table are reverted. The remap table
(driDispatchRemapTable) is always initialized in the same way regardless
of the context API.
es_generator.py is updated to use a local remap table, whose sole
purpose is to help initialize its dispatch table. The local remap table
and the global one are always different, as they use different
glapidispatch.h. But the dispatch tables initialized by both remap
tables are always compatible with glapi (libGL.so).
Finally, the semantics of one_time_init are changed to per-api one-time
initialization.
Core mesa should query glapi for the positions of the functions in
_glapi_table when multiple APIs are supported. It does not know which
glapitable.h glapi used.
Use scons target and dependency system instead of ad-hoc options.
Now is simply a matter of naming what to build. For example:
scons libgl-xlib
scons libgl-gdi
scons graw-progs
scons llvmpipe
and so on. And there is also the possibility of scepcified subdirs, e.g.
scons src/gallium/drivers
If nothing is specified then everything will be build.
There might be some rough corners over the next days. Please bare with me.
Without this, update_textures may not pick up the new pipe_resource.
It is actually update_textures that should check
stObj->sampler_view->texture != stObj->pt, but let's follow st_TexImage
and others for now.
Make autoconf decide the client APIs enabled first. Then when OpenGL
and OpenGL ES are disabled, there is no need to build src/mesa/; when
OpenGL is disabled, no $mesa_driver should be built. Finally, add
--enable-openvg to enable OpenVG.
With these changes, an OpenVG only build can be configured with
$ ./configure --disable-opengl --enable-openvg
src/mesa, src/glsl, and src/glx will be skipped, which saves a great
deal of compilation time.
And an OpenGL ES only build can be configured with
$ ./configure --disable-opengl --enable-gles-overlay
Fixes this GCC warning.
state_tracker/st_program.c: In function 'st_print_shaders':
state_tracker/st_program.c:735: warning: 'sh' may be used uninitialized in this function
This fixes some insanity that would otherwise be required for GLSL
1.30 bit ops or gen6 integer uniform operations in general, at the
cost of upload-time pain. Given that we only have that pain because
mesa's mangling our integer uniforms to be floats, this something that
should be fixed outside of the shader codegen.
First, it changes autoconf to use a "python2" binary when available,
rather than plain "python" (which is ambiguous). Secondly, it changes
the Makefiles to use $(PYTHON) $(PYTHON_FLAGS) rather than calling
python directly.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Matthew William Cox <matt@mattcox.ca>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:593: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:593: note: 'r' was declared here
radeon_bo_destroy() will want to read the list field. Without this patch,
we'd end up evaluating the list pointers before they have been properly
set up when we destroyed the newly created bo if it cannot be mapped.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
With 07b85457d9, glapitable.h is included
by core mesa only to know the size of _glapi_table. It is not necessary
as the same info is given by _gloffset_COUNT.
This change makes _glapi_table opaque to core mesa. All operations on
it are supposed to go through one of the SET/GET/CALL macros.
Move defines in glapioffsets.h to glapidispatch.h. Rename
_gloffset_FIRST_DYNAMIC to _gloffset_COUNT, which is equal to the number
of entries in _glapi_table.
Consistently use SET_by_offset, GET_by_offset, CALL_by_offset, and
_gloffset_* to recursively define all SET/GET/CALL macros.
glapioffsets.h exists for the same reason as glapidispatch.h does. It
is of no use to glapi. This commit also drops the use of glapioffsets.h
in glx as glx is considered an extension to glapi when it comes to
defining public GL entries.
glapidispatch.h exists so that core mesa (libmesa.a) can be built for
DRI drivers or for non-DRI drivers as a compile time decision (whether
IN_DRI_DRIVER is defined). It is of no use to glapi. This commit also
drops the use of glapidispatch.h in glx and libgl-xlib as they are
considered extensions to glapi when it comes to defining public GL
entries.
This lets us simplify and consolidate some state checking code.
This implements the GL_INVALID_OPERATION check for all drawing commands
required by GL_EXT_texture_integer.
It's a little more painful than before because we don't have the handy
mask register any more, and have to make do with cooking up a value
out of the flag register.
This is apparently required, as the thread will be initiated while it
still has dependencies, and this is what waits for those to be
resolved before writing color.
Function ast_declarator_list::hir(), when processing keywords added by
extension ARB_fragment_coord_conventions, made the mistake of checking only if
the extension was __supported by the driver__. The correct behavior is to check
if the extensi0n is __enabled in the parse state__.
NOTE: this is a candidate for the 7.9 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ensure -L$(TOP)/$(LIB_DIR) (the staging dir for build products), appears
in the link line before any -L in $LDFLAGS, so that we link driver we are
building with libEGL we have just built, and not an installed version
[olv: make a similar change to targets/egl]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This call sequence
eglMakeCurrent(dpy, surf, surf, ctx1);
eglMakeCurrent(dpy, surf, surf, ctx2);
should be valid if ctx1 and ctx2 have the same client API and are not
current in another thread.
Internally a mode belongs to a screen. But functions like
eglGetModeAttribMESA treat a mode as a display resource: a mode can be
looked up without a screen. Considering how KMS works, it is better to
stick to the current implementation.
To properly support looking up a mode without a screen, this commit
assigns each mode (of all screens) a unique ID.
The opaque nature of EGLImage implies that extensions almost always
define their own attributes. Move attributes in _EGLImage to
_EGLImageAttribs and add a helper function to parse attribute lists.
Fixes glsl-fs-convolution-2, which was blowing up due to the array
access insanity getting at the uniform values within the loop. Each
temporary was considered live across the whole loop.
One, it was allocating increments of 1kb, but per thread scratch space
is a power of two. Two, the new FS wasn't getting total_scratch set
at all, so everyone thought they had 1kb and writes beyond 1kb would
go stomping on a neighbor thread.
With this plus the previous register spilling for the new FS,
glsl-fs-convolution-1 passes.
g0 is used by others, and is expected to be left exactly as it was
dispatched to us. So manually move g0 into our message reg when
spilling/unspilling and update the offset in the MRF. Fixes failures
in texture sampling after having spilled a register.
It's amazing this code worked. Basically, we would get lucky in
register allocation and the tests using frontfacing would happen to
allocate gl_FrontFacing storage and the instructions generating
gl_FrontFacing but pointing at another register to the same hardware
register. Noticed during register spilling debug, when suddenly they
didn't get allocatd the same storage.
Since this is just generated by python, it's questionable whether this
should continue to live in the repository - Mesa already has other
things generated from python as part of the build process.
This works around MSVC's 65535 byte limit, unfortunately at the expense
of any semblance of readability and much larger file size. Hopefully I
can implement a better solution later, but for now this fixes the build.
Fixes this GCC warning.
gallivm/lp_bld_tgsi_aos.c: In function 'lp_build_tgsi_aos':
gallivm/lp_bld_tgsi_aos.c:516: warning: 'dst0' may be used uninitialized in this function
gallivm/lp_bld_tgsi_aos.c:516: note: 'dst0' was declared here
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_nearest':
gallivm/lp_bld_sample_aos.c:271: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:271: warning: 'r_ipart' may be used uninitialized in this function
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_linear':
gallivm/lp_bld_sample_aos.c:439: warning: 'r_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_lo' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_lo' may be used uninitialized in this function
At the moment you need kernel patches to have texture tiling work
with the kernel CS checker, so once they are upstream and the drm version
is bumped we can make this enable flip the other way most likely.
Since the hw transitions from 2D->1D sampling below the 2D macrotile
size we need to keep track of the array mode per level so we can
render to it using the CB.
Silences this GCC warning.
ast_to_hir.cpp: In function 'void apply_type_qualifier_to_variable(const
ast_type_qualifier*, ir_variable*, _mesa_glsl_parse_state*, YYLTYPE*)'
ast_to_hir.cpp:1768: warning: enumeration value 'ir_shader' not handled
in switch
We were working around an LLVM 2.5 bug but we're using LLVM 2.6 or later now.
This basically reverts commit baddcbc522.
This fixes the piglit bug/tri-tex-crash.c failure.
Silences these GCC warnings.
r600_shader.c: In function 'tgsi_exp':
r600_shader.c:2339: warning: 'r600_src[0].rel' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].abs' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].neg' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].chan' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].sel' is used uninitialized in this function
Previously some shader input or outputs that hadn't received location
assignments could slip through. This could happen when a shader
contained user-defined varyings and was used with either
fixed-function or assembly shaders.
See the piglit tests glsl-[fv]s-user-varying-ff and
sso-user-varying-0[12].
NOTE: this is a candidate for the 7.9 branch.
This function type checks the operands of and returns the result type of
bit-logic operations. It replaces the type checking performed in the
following cases of ast_expression::hir() :
- ast_bit_and
- ast_bit_or
- ast_bit_xor
This function type checks the operands of and returns the result type of
bit-shift operations. It replaces the type checking performed in the following
cases of ast_expression::hir() :
- ast_lshift
- ast_rshift
This fixes hangs in some Z-writes-in-shaders tests, though other
pieces don't come out correctly.
Bug #30392: hang in fbo-fblit-d24s8. (still fails with bad color drawn
to some targets)
rc_get_readers_normal() supplies a list of readers for a given
instruction. This function is now being used by the copy propagate
optimization and will eventually be used by most other optimization
passes as well.
Existing code relies on IR being generated (possibly with error type)
rather than returning NULL. So, don't break - go ahead and generate the
operation. As long as an error is flagged, things will work out.
Fixes fd.o bug #30914.
Thanks to Alex Deucher for pointing out the FLT to int conversion is necessary
and writing an initial patch, this brings about 20 piglits, and I think this
is the last piece to make evergreen and r600 equal in terms of features.
We need to keep track of three different fragment shaders: Z-only, stencil-
only, and Z+stencil. Before, we were only keeping track of the first one
we encountered.
We often use reg_null as the destination when setting up the flag
regs. However, on gen6 there aren't general implicit conversions to
destination types from src types, so the comparison to produce the
flag regs would be done on the integer result interpreted as a float.
Hilarity ensued.
Fixes 20 piglit cases.
In ir_validate::visit_leave(), the cases for
- ir_binop_bit_and
- ir_binop_bit_xor
- ir_binop_bit_or
were incorrect. It was incorrectly asserted that both operands must be the
same type, when in fact one may be scalar and the other a vector. It was also
incorrectly asserted that the resultant type was the type of the left operand,
which in fact does not hold when the left operand is a scalar and the right
operand is a vector.
Implement by adding the following cases to ast_expression::hir():
- ast_lshift
- ast_rshift
Also, implement ir validation for the new operators by adding the following
cases to ir_validate::visit_leave():
- ir_binop_lshift
- ir_binop_rshift
On evergreen, interpolation has moved into the fragment shader,
with the interpolation parmaters being passed via GPRs and LDS entries.
This works out the number of interps required and reserves GPR/LDS
storage for them, it also correctly routes face/position values which
aren't interpolated from the vertex shader.
Also if we noticed nothing is to be interpolated we always setup perspective
interpolation for one value otherwise the GPU appears to lockup.
This fixes about 15 piglit tests on evergreen.
Previously _LinkedShaders was a compact array of the linked shaders
for each shader stage. Now it is arranged such that each slot,
indexed by the MESA_SHADER_* defines, refers to a specific shader
stage. As a result, some slots will be NULL. This makes things a
little more complex in the linker, but it simplifies things in other
places.
As a side effect _NumLinkedShaders is removed.
NOTE: This may be a candidate for the 7.9 branch. If there are other
patches that get backported to 7.9 that use _LinkedShader, this patch
should be cherry picked also.
This implements round() via the ir_unop_round_even opcode, rather than
adding a new opcode. We may wish to add one in the future, since it
might enable a small performance increase on some hardware, but for now,
this should suffice.
Tested with demos/pixeltest - line rasterization doesn't seem to be
set up for GL conventions yet, but at least width is respected now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is now to the point where we have no regressing piglit tests. It
also fixes Yo Frankie! and Humus DynamicBranching, probably due to the
piglit bias tests that work that didn't on the Mesa IR backend.
As a downside, performance takes about a 5-10% performance hit at the
moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by
reintroducing 16-wide fragment shaders where possible. It is a win,
though, for fragment shaders using flow control.
Simply using RNDU, RNDZ, or RNDE does not produce the desired result.
Rather, the RND* instructions place a value in the destination register
that may be 1 less than the correct answer. They can also set per-channel
"increment bits" in a flag register, which, if set, mean dest needs to
be incremented by 1. A second instruction - a predicated add -
completes the job.
Notably, RNDD always produces the correct answer in a single
instruction.
Fixes piglit test glsl-fs-trunc.
This cuts usually 2 out of 3 instructions for flag reg generation (if
statements, conditional assignment) by producing the conditional mod
in the expression representing the boolean value.
Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register
allocation no longer fails for the conditional generation
proliferation)
This will be a place to peephole comparisions directly to the flag
regs, and for now avoids using MOV with conditional mod on gen6, which
is now illegal.
GLES1 and GLES2 install their own exec pointers and don't need the
Save table. Also, the SET_* macros use different indices for the different
APIs so the offsets used in vtxfmt.c are actually wrong for the ES APIs.
Just always check for FLUSH_UPDATE_CURRENT and call Driver.BeginVertices
when necessary. By using the unlikely() macros, this ends up as
a 10% performance improvement (for isosurf, anyway) over the old,
complicated function pointer swapping.
Don't use r0 for FF_SYNC dest reg on Sandybridge, which would
smash FFID field in GS payload, that cause later URB write fail.
Also not use r0 in any URB write requiring allocate.
Actually validate that the implementation supports the particular
shader target as well. Previously if a driver only supported vertex
shaders, for example, glCreateShaderObjectARB would gladly create a
fragment shader.
NOTE: this is a candidate for the 7.9 branch.
This really amounts to just using the return value from
link_function_calls. All the work was being done, but the result was
being ignored.
Fixes piglit test link-unresolved-funciton.
NOTE: this is a candidate for the 7.9 branch.
Together with the previous commit, this generalize the benefits of
d2cf757f44 to all depth formats, in
particular:
- simpler float -> 24unorm conversion
- avoid unsigned comparisons (not directly supported on SSE) by aligning
to the least significant bit
- avoid unecessary/repeated mask ANDing
Verified with trivial/tri-z that the exact same assembly is produced for
X8Z24.
Z32_FLOAT uses <4 x float> as intermediate/destination type,
instead of <4 x i32>.
The necessary bitcasts got removed with commit
5b7eb868fd
Also use depth/stencil type and build contexts consistently, and
make the depth pointer argument a ordinary <i8 *>, to catch this
sort of issues in the future (and also to pave way for Z16 and
Z32_FLOAT_S8_X24 support).
__GLcontextModes is always only used as an implementation internal struct
at this point and we shouldn't install glcore.h anymore. Anything that
needs __GLcontextModes should just include the struct in its headers files
directly.
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
Fixes this GCC warning.
r300_state.c: In function 'r300InvalidateState':
r300_state.c:2247: warning: 'hw_format' may be used uninitialized in this function
r300_state.c:2247: note: 'hw_format' was declared here
This adds proper support for the GL_ARB_shader_stencil_export extension
to the GLSL compiler. Thanks to Ian for pointing out where I need to add things.
If the pipe driver has shader stencil export we can accelerate DrawPixels
using it. It tries to pick an S8 texture and works its way to X24S8 and S8X24
if that isn't supported.
We need a texture to put the drawpixels stuff into, an S8 texture is less
memory/bandwidth than the 32-bit X24S8, but we might not be able to render
directly to an S8, so this lets us specify we won't be rendering to this
texture.
this improves mesa texstore for 8/24 so it can create S24X8/X24S8 variants
by keeping the depth bits static.
it also adds a texstore for S8 so we can write out an S8 texture to use
in the sampler for accel draw pixels to save memory bw.
The logic seems sound here, I've worked it out a few times on paper, though
it would be good to have some review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There was a check to only do the rebase if we didn't have everything
in VBOs, but nexuiz apparently hands us a mix of VBOs and arrays,
resulting in blocking on the GPU to do a rebase.
Improves nexuiz 800x600, high-settings performance on my Ironlake 41%
(+/- 1.3%), from 14.0fps to 19.7fps.
The format selection of the CopyTexSubImage is pretty bogus still, but
this at least avoids software fallbacks in nexuiz, bringing
performance from 7.5fps to 12.8fps on my machine.
This assertion was added in commit f1c1ee11, but it did not notice
that the array is accessed with 'size-1' instead of 'size'. As a
result, the assertion was off by one. This caused failures in at
least glsl-orangebook-ch06-bump.
If an GLSL shader is used that does not provide all stages and
assembly shaders are provided for the missing stages, validate the
assembly shaders.
Fixes bugzilla #30787 and piglit tests glsl-invalid-asm0[12].
NOTE: this is a candidate for the 7.9 branch.
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).
This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.
Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.
Fixes 7 piglit math tests.
Add ability to set the GLSL version used by the GLcontext by setting the
environment variable INTEL_GLSL_VERSION. For example,
env INTEL_GLSL_VERSION=130 prog args
If the environment variable is missing, the GLSL versions defaults to 120.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By calling radeon_draw_buffers (which sets the necessary flags
in radeon->NewGLState) and revalidating if NewGLState is non-zero
in r200TclPrimitive. This fixes an assert in libdrm (the color-/
depthbuffer was changed but not yet validated) and and stops the
kernel cs checker from complaining about them (when they're too
small).
Thanks to Mario Kleiner for the hint to call radeon_draw_buffer
(instead of my half-broken hack).
v2: Also fix the swtcl r200 path.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
Fixes these GCC warnings.
brw_wm_fp.c: In function 'search_or_add_const4f':
brw_wm_fp.c:92: warning: 'reg.Index2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.Index2' was declared here
brw_wm_fp.c:92: warning: 'reg.RelAddr2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.RelAddr2' was declared here
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
The constness of the function parameter gets inlined with the rest of
the function. However, there is also an assignment to the parameter.
If this occurs inside a loop the loop analysis code will get confused
by the assignment to a read-only variable.
Fixes bugzilla #30552.
NOTE: this is a candidate for the 7.9 branch.
Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
We were trying to remap a fully-filled array down to only handing the
WM the components it uses. This is called attribute swizzling, and if
you don't enable it you just get 1:1 mappings of inputs to outputs.
This almost fixes glsl-routing, except for the highest gl_TexCoord[]
indices.
We would compute a new buffer, but never point the hardware at the new
buffer. This partially fixes glsl-routing, as now it get the updated
uniform for which attribute to draw.
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
interp data is stored in gpr0 so first interp overwrote it
and subsequent ones got wrong values
reserve register 0 so it's not used for attribs.
alternative is to interpolate attrib0 last (reverse, as r600c does)
We sensibly only provide it if the FS asks for it. We could actually
skip WPOS unless the FS needed WPOS.zw, but that's something for
later.
Fixes: glsl-texture2d and probably many others.
- Handle PIPE_FORMAT_Z32_FLOAT packing correctly.
- In the integer version z shouldn't be passed as as double.
- Make it clear that the integer versions should only be used for masks.
- Make integer type sizes explicit (uint32_t for now, although
uint64_t will be necessary later to encode f32_s8_x24).
The jump delta is now in the part of the instruction where the
destination fields used to be, and the src args are ignored (or not,
for the new non-predicated IF that we don't use yet).
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
Got a speed up by tracking the dirty blocks in a seperate list instead of looping through all blocks. This version should work with block that get their dirty state disabled again and I added a dirty check during the flush as some blocks were already dirty.
Flush read cache before writting register. Track flushing inside
of a same cs and avoid reflushing same bo if not necessary. Allmost
properly force flush if bo rendered too and then use as a texture
in same cs (missing pipeline flush dunno if it's needed or not).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When we go to do a lot of bos in one draw like constant bufs we need
to avoid bouncing off the busy ioctl, this mitigates by backing off
on busy bos for a short amount of times.
These texture formats (like R16G16B16A16_UNORM) were untested until now
because st/mesa doesn't use them. I am testing this with a hacked st/mesa
here.
I am trying to be exhaustive, but still I might have missed tons of other
changes to Gallium.
(cherry picked from commit 968a9ec76e)
Conflicts:
docs/relnotes-7.9.html
The glsl core should be handling most dead code issues for us, but we
generate some things in codegen that may not get used, like the 1/w
value or pixel deltas. It seems a lot easier this way than trying to
work out up front whether we're going to use those values or not.
this code was memcmp'ing two structs, but refcounting one of them afterwards,
so any subsequent memcmp was never going to work.
again this stops unnecessary uploads of vertex program,
The brw_wm_surface_state.c handling of GL_DEPTH_TEXTURE_MODE doesn't
apply to shadow compares, which always return an intensity value. The
texture swizzles can do the job for us.
Fixes:
glsl1-shadow2D(): 1
glsl1-shadow2D(): 3
The hw swizzles have been obtained by a brute force approach,
and only C0 and C2 are stored in UV88, the other channels are
ignored.
R16G16 is going to be a lot trickier.
Luckily, one of them would result in failing out register allocation
when the other bugs were encountered. Applies to
glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined, which still
fails register allocation, but now legitimately.
Fixes several problems:
The half-float, float, and integer internal formats depend on
ARB_texture_rg and other extensions.
RG_INTEGER is not a valid internal format.
Generic compressed formats depend on ARB_texture_rg, not
EXT_texture_compression_rgtc.
Use GL_RED instead of GL_R.
We should fix the SF to actually give us just the data we need, but
this fixes regressions in the new FS until then.
Fixes:
glsl-kwin-blur
glsl-routing
Trying to track the insanity of the different argument layouts for
normal/shadow crossed with normal/lod/bias one generation at a time is
enough.
Fixes: glsl1-texture2D() with bias.
(first test passing in this code that doesn't pass without it!)
We should end up with the same code, but anyone else with this issue
could share the handling (which I got wrong for shadow comparisons in
the driver before).
I've screwed this up enough times that I don't think it's worth it.
This time, it was that I was doing it once per top-level body
instruction instead of just once at the end of the loop body.
The interference is totally bogus (maximal), so this is equivalent to
our trivial register assignment before. As in, passes the same set of
piglit tests.
Now, virtual GRFs are consecutive integers, rather than offsetting the
next one by the size. We need the size information to still be around
for real register allocation, anyway.
Fixes this GCC warning on linux-x86 build.
r3xx_vertprog.c: In function ‘ei_if’:
r3xx_vertprog.c:396: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog_emit.c: In function ‘emit_paired’:
r500_fragprog_emit.c:237: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_tex’:
r500_fragprog_emit.c:367: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_flowcontrol’:
r500_fragprog_emit.c:415: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘r500BuildFragmentProgramHwCode’:
r500_fragprog_emit.c:633: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog.c: In function ‘r500_transform_IF’:
r500_fragprog.c:45: warning: ISO C90 forbids mixed declarations and code
r500_fragprog.c: In function ‘r500FragmentProgramDump’:
r500_fragprog.c:256: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_emit.c: In function ‘emit_alu’:
r300_fragprog_emit.c:143: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c:156: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘finish_node’:
r300_fragprog_emit.c:271: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘emit_tex’:
r300_fragprog_emit.c:344: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_swizzle.c: In function ‘r300_swizzle_is_native’:
r300_fragprog_swizzle.c:120: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_swizzle.c: In function ‘r300_swizzle_split’:
r300_fragprog_swizzle.c:159: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_rename_regs.c: In function ‘rc_rename_regs’:
radeon_rename_regs.c:112: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_remove_constants.c: In function ‘rc_remove_unused_constants’:
radeon_remove_constants.c💯 warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warning on linux-x86 build.
radeon_optimize.c: In function ‘constant_folding’:
radeon_optimize.c:419: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:425: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:432: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_dataflow_deadcode.c: In function ‘push_branch’:
radeon_dataflow_deadcode.c:112: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘update_instruction’:
radeon_dataflow_deadcode.c:183: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘rc_dataflow_deadcode’:
radeon_dataflow_deadcode.c:352: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c:379: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_pair_regalloc.c: In function ‘rc_pair_regalloc_inputs_only’:
radeon_pair_regalloc.c:330: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_schedule.c: In function ‘emit_all_tex’:
radeon_pair_schedule.c:244: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘destructive_merge_instructions’:
radeon_pair_schedule.c:291: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c:438: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_read’:
radeon_pair_schedule.c:619: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_write’:
radeon_pair_schedule.c:645: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘schedule_block’:
radeon_pair_schedule.c:673: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘rc_pair_schedule’:
radeon_pair_schedule.c:730: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_translate.c: In function ‘set_pair_instruction’:
radeon_pair_translate.c:153: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:170: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c: In function ‘rc_pair_translate’:
radeon_pair_translate.c:336: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:341: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_program_alu.c: In function ‘r300_transform_trig_simple’:
radeon_program_alu.c:882: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c:932: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘radeonTransformTrigScale’:
radeon_program_alu.c:996: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘r300_transform_trig_scale_vertex’:
radeon_program_alu.c:1033: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_emulate_loops.c: In function ‘rc_emulate_loops’:
radeon_emulate_loops.c:517: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings with linux-x86 build.
radeon_emulate_branches.c: In function ‘handle_if’:
radeon_emulate_branches.c:65: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:71: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_else’:
radeon_emulate_branches.c:94: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_endif’:
radeon_emulate_branches.c:201: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘fix_output_writes’:
radeon_emulate_branches.c:267: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:284: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘rc_emulate_branches’:
radeon_emulate_branches.c:307: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning.
math/m_debug_xform.c: In function '_math_test_all_transform_functions':
math/m_debug_xform.c:320: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_norm.c: In function '_math_test_all_normal_transform_functions':
math/m_debug_norm.c:365: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_clip.c: In function '_math_test_all_cliptest_functions':
math/m_debug_clip.c:363: warning: format not a string literal and no format arguments
Instead of creating group of register use a hash table
to lookup into which block each register belongs. This
simplify code a bit.
Signed-off-by: Jerome Glisse <jglisse@redhat.com
Fixes glean pbo crash.
It would be possible to avoid crashing without decoupling, but given
that state trackers give no guarantee that number of views is consistent,
that would likely cause too many state updates (or miss some).
Corrections in store_clip to store clip coordinates in AoS form.
Viewport & cliptest flag options based on variant key.
Put back draw_pt_post_vs and now 2 paths based on whether clipping
occurs or not.
Cliptesting now done at the end of vs in draw_llvm instead of
draw_pt_post_vs.
Added viewport mapping transformation and further cliptesting to
vertex shader in draw_llvm.c
Alternative path where vertex header setup, clip coordinates store,
cliptesting and viewport mapping are done earlier in the vertex
shader.
Still need to hook this up properly according to the return value of
"draw_llvm_shader" function.
I'm still not pleased with how builtin uniforms are handled, but as
long as we're relying on the prog_statevar stuff this seems about as
good as it'll get.
Otherwise, we'll often end up with gl_TexCoord being 0 length, for
example. With ir_to_mesa, things ended up working out anyway, as long
as multiple implicitly-sized arrays weren't involved.
Fixes these tests using gl_FragData or just gl_FragDepth:
glsl1-Preprocessor test (extension test 1)
glsl1-Preprocessor test (extension test 2)
glsl-bug-22603
We don't set the type on the array virtual reg as a whole, so here's
the right place.
Fixes:
glsl1-GLSL 1.20 arrays
glsl1-temp array with constant indexing, fragment shader
glsl1-temp array with swizzled variable indexing
We need to re-run channel expressions afterwards as it generates new
vector expressions, and we need to successfully support conditional
assignment (brw_CMP takes 2 operands, not 1).
While much of this we will want to support natively, this should make
the task of reaching the Mesa IR backend's quality easier.
Fixes:
glsl-fs-main-return.
New design seems to be on parity according to piglit,
make it default to get more exposure and see if there
is any show stopper in the coming days.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Sandybridge's PS constant buffer payload size is decided from
push const buffer command, incorrect size would cause wrong data
in payload for position and vertex attributes. This fixes coefficients
for tex2d/tex3d.
The driver actually creates a 3D texture aligned to POT and does all
the magic with texture coordinates in the fragment shader. It first
emulates REPEAT and MIRRORED wrap modes in the fragment shader to get
the coordinates into the range [0, 1]. (already done for 2D NPOT)
Then it scales them to get the coordinates of the NPOT subtexture.
NPOT textures are now less of a lie and we can at least display
something meaningful even for the 3D ones.
Supported wrap modes:
- REPEAT
- MIRRORED_REPEAT
- CLAMP_TO_EDGE (NEAREST filtering only)
- MIRROR_CLAMP_TO_EDGE (NEAREST filtering only)
- The behavior of other CLAMP modes is undefined on borders, but they usually
give results very close to CLAMP_TO_EDGE with mirroring working perfectly.
This fixes:
- piglit/fbo-3d
- piglit/tex3d-npot
Taking the W component from coords directly ignores swizzling. Instead,
take the component which is mapped to W in the TEX instruction parameter.
The same for Z.
NOTE: This is a candidate for the 7.9 branch.
Some random stuff I had here.
1) Fixed some misleading comments.
2) Removed fake_npot, since it's redundant.
3) lower_texture_rect -> scale_texcoords
4) Reordered and reindented some TEX transform code.
It's trying to get an int smeared across all channels, not trying to
get a 1:1 mapping of a subset of a vector's channels. This usually
ended up not mattering with ir_to_mesa, since it just smears floats
into every chan of a vec4.
Fixes:
glsl1-temp array with swizzled variable indexing
The pipe_sampler_view's swizzle terms also apply to the texture border
color. Simply move the apply_sampler_swizzle() call after we fetch
the border color.
Fixes many piglit texwrap failures.
Changes in v2:
- Invalidate last_tile_addr on any change, fixing regressions
- Correct coding style
Currently softpipe ends up allocating more than 200 MB of memory
for each context due to the tile caches.
Even worse, this memory is all explicitly cleared, which means that the
kernel must actually back it with physical RAM right away.
This change allocates tile memory on demand.
Signed-off-by: Brian Paul <brianp@vmware.com>
Build packet header once and allow to add fake register support so
we can handle things like indexed set of register (evergreen sampler
border registers for instance.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 80ee3a440c added a PROGS_DEPS
definition, but no uses, even though it seems clearly intended
to be a set of additional dependencies for $(PROGS).
Correct this.
Currently makedepend is used by the Mesa Makefile-based build system,
but not required.
Unfortunately, not having it makes dependency resolution non-existent,
which is a source of subtle bugs, and is a rarely tested
configuration, since all Mesa developers likely have it installed.
Furthermore some idioms require dependency resolution to work at all,
such as making headers depend on generated files.
Fixes these GCC warnings.
r600_shader.c: In function 'tgsi_tex':
r600_shader.c:1611: warning: 'src2_chan' may be used uninitialized in this function
r600_shader.c:1611: warning: 'src_chan' may be used uninitialized in this function
When occlusion query are running we want to have accurate
fragment count thus disable any early culling optimization
GPU has.
Based on work from Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes this GCC warning.
radeon_state.c: In function 'radeon_state_fini':
radeon_state.c:140: warning: 'return' with a value, in function returning void
Move GB_ENABLE to derived rs state, and find sprite coord for the correct
generic and enable the tex coord for that generic.
Signed-off-by: Dave Airlie <airlied@redhat.com>
1. We can't turn an instruction into a presubtract operation if it
writes to one of the registers it reads from.
2. If we turn an instruction into a presubtract operation, we can't
remove that intruction unless all readers can use the presubtract
operation.
This fixes fdo bug 30337.
This is a candidate for the 7.9 branch.
The variables are used only in currently disabled code.
Fixes this GCC warning.
r600_context.c: In function 'r600_flush':
r600_context.c:76: warning: unused variable 'dname'
r600_context.c:75: warning: unused variable 'dc'
this is more a proof to show vector shifts on x86 with per-element shift count
are evil. Since we can avoid the shift with a single compare/select, use that
instead. Replaces more than 20 instructions (and slow ones at that) with about 3,
and cuts compiled shader size with mesa's yuvsqure demo by over 10%
(no performance measurements done - but selection is blazing fast).
Might want to revisit that for future cpus - unfortunately AVX won't have vector
shifts neither, but AMD's XOP will, but even in that case using selection here
is probably not slower.
While it's true that llvm can and will indeed replace this with bit
arithmetic (since block height/width is POT), it does so (llvm 2.7) by element
and hence extracts/shifts/reinserts each element individually.
This costs about 16 instructions (and extract is not really fast) vs. 1...
Fixes this GCC warning.
r600_hw_states.c: In function 'r600_translate_fill':
r600_state_inlines.h:136: warning: control reaches end of non-void function
The variables are only used in currently disabled code.
Fixes this GCC warning.
r600_state2.c: In function 'r600_flush2':
r600_state2.c:613: warning: unused variable 'dname'
r600_state2.c:612: warning: unused variable 'dc'
Fix this GCC warning.
intel_pixel_bitmap.c: In function 'intelBitmap':
intel_pixel_bitmap.c:343: warning: implicit declaration of function '_mesa_meta_Bitmap'
Silence this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:578: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:578: note: 'r' was declared here
Up to 2010-09-19:
r600g: fix tiling support for ddx supplied buffers
9b146eae25
user buffer seems to be broken... new to fix that.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Before, changing any of these sampler values triggered generation
of new JIT code. Added a new flag for the special case of
min_lod == max_lod which is hit during auto mipmap generation.
TX_BORDER_COLOR should be formatted according to the texture format.
Also the interaction with ARB_texture_swizzle should be fixed too.
NOTE: This is a candidate for the 7.9 branch.
This commit fixes an infinite loop in foreach_s if remove_from_list is used
more than once on the same item with other list operations in between.
NOTE: This is a candidate for the 7.9 branch because the commit
"r300g: fixup long-lived BO maps being incorrectly unmapped when flushing"
depends on it.
This is already covered by radeon_mipmap_tree.c, and my convolution
cleanups broke in the presence of this code. Thanks to Marek Olšák
for tracking down the relevant miptree code for me.
Make libr300compiler.a a PHONY target so that this library will always be
built. This fixes the problem of libr300compiler.a not being updated
when r300g is being built and r300c is not.
This is a candidate for the Mesa 7.9 branch.
Many of the EXT_ extensions in the subset have significant code
overhead with no users. It is not a required part of GL -- though
text describing the extension is part of the core spec since 1.2, it
is always conditional on the ARB_imaging extension.
Some pathological triangles cause a theoritically impossible number of
clipped vertices.
The clipper will still assert, but at least release builds will not
crash, while this problem is further investigated.
use the blitter + custom stage to avoid doing a whole lot of state
setup by hand. This makes life a lot easier for doing this on evergreen
it also keeps all the state setup in one place.
We setup a custom context state at the start with a flag to denote
its for the flush, when it gets generated we generate the correct state
for the flush and no longer have to do it all by hand.
this should also make adding texture *to* depth easier.
It turns out that most people new to this IR are surprised when an
assignment to (say) 3 components on the LHS takes 4 components on the
RHS. It also makes for quite strange IR output:
(assign (constant bool (1)) (x) (var_ref color) (swiz x (var_ref v) ))
(assign (constant bool (1)) (y) (var_ref color) (swiz yy (var_ref v) ))
(assign (constant bool (1)) (z) (var_ref color) (swiz zzz (var_ref v) ))
But even worse, even we get it wrong, as shown by this line of our
current step(float, vec4):
(assign (constant bool (1)) (w)
(var_ref t)
(expression float b2f (expression bool >=
(swiz w (var_ref x))(var_ref edge))))
where we try to assign a float to the writemasked-out x channel and
don't supply anything for the actual w channel we're writing. Drivers
right now just get lucky since ir_to_mesa spams the float value across
all the source channels of a vec4.
Instead, the RHS will now have a number of components equal to the
number of components actually being written. Hopefully this confuses
everyone less, and it also makes codegen for a scalar target simpler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can't expect to have a context when this is called, and we don't need one
so just require a __DRIscreen instead.
Reported by Yu Dai <yu.dai@intel.com>
Shader rebuild should be more clever, we should store along each
shader all the value that change shader program rather than using
flags in context (ie change sequence like : change vs buffer, draw,
change vs buffer, switch shader will trigger useless shader rebuild).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Normally the Mesa state tracker uses TXP instructions for texturing.
But if a fragment shader uses texture2D() that's a TEX instruction.
In that case we were incorrectly computing the texcoord coefficients
in the point sprite setup code. Some new comments in the code explain
things.
Fix commit e7087175f8. Move the reference to
GL_VERSION_2_1_functions to intel_extensions.c where it's available,
don't try to enable a non-existing extension and advertise 1.20 for all
intel chipsets, not just GEN4 and up.
progs requires winsys, which hasn't yet been built by the time we
go into state_trackers.
It may be a good idea to also move it into tests.
After a normal build, run make in src/gallium/state_trackers/d3d1x/progs
to build them.
The Gallium EGL state tracker reuses dri2.c but not the GLX code.
Currently there is a bit of code in dri2.c that is incorrectly tied
to GLX: instead, make it call an helper that both GLX and Gallium EGL
implement, like dri2InvalidateBuffers.
This avoids a link error complaining that dri2GetGlxDrawableFromXDrawableId
is undefined.
Note that we might want to move the whole event translation elsewhere,
and probably stop using non-XCB DRI2 altogether, but this seems to be
the minimal fix.
This may just be hiding some other bug though, since the types are supposed
to be the same (and it compiles for me).
Anyway, this interface will likely need to changed, since it seems Wine needs
a more powerful one capable of expressing window subregions and called at
every Present.
add cb/db flush states to the blit code.
add support for the rv6xx that need special treatment.
according to R6xx_7xx_3D.pdf
set r700 CB_SHADER_CONTROL reg in blit code
docs say dual export should be disabled for DB->CB
Instead of using the invalid GL_ARB_shading_language_120 extension to
determine the GLSL version, use a new ctx->Const.GLSLVersion field.
Updated the intel and r600 drivers, but untested.
See fd.o bug 29910
NOTE: This is a candidate for the 7.9 branch (but let's wait and see if
there's any regressions).
The old code didn't really make sense. We only need to compare the
X channel of the texture (depth) against the texcoord.
For (bi)linear sampling we should move the calls to this function
and compute the final result as (s1+s2+s3+s4) * 0.25. Someday.
This fixes the glean glsl1 shadow2D() tests. See fd.o bug 29307.
There was some libstdc++-specific code that would only build with GCC 4.5
Now it should be much more compatible, at the price of reimplementing
the generic hash function.
Looks like the problem was we weren't passing the depth to the render
target as expected, so the chip would wedge. Fixes GPU hang in
occlusion-query-discard.
Bug #30097
Recent Wine versions provide a d3d11shader.h, which is however empty
and was getting used instead of our non-empty one.
Correct the include path order to fix this.
This is a new implementation of the Direct3D 11 COM API for Gallium.
Direct3D 10 and 10.1 implementations are also provided, which are
automatically generated with s/D3D11/D3D10/g plus a bunch of #ifs.
While this is an initial version, most of the code is there (limited
to what Gallium can express), and tri, gears and texturing demos
are working.
The primary goal is to realize Gallium's promise of multiple API
support, and provide an API that can be easily implemented with just
a very thin wrapper over Gallium, instead of the enormous amount of
complex code needed for OpenGL.
The secondary goal is to run Windows Direct3D 10/11 games on Linux
using Wine.
Wine dlls are currently not provided, but adding them should be
quite easy.
Fglrx and nvidia drivers can also be supported by writing a Gallium
driver that talks to them using OpenGL, which is a relatively easy
task.
Thanks to the great design of Direct3D 10/11 and closeness to Gallium,
this approach should not result in detectable overhead, and is the
most maintainable way to do it, providing a path to switch to the
open Gallium drivers once they are on par with the proprietary ones.
Currently Wine has a very limited Direct3D 10 implementation, and
completely lacks a Direct3D 11 implementation.
Note that Direct3D 10/11 are completely different from Direct3D 9
and earlier, and thus warrant a fully separate implementation.
The third goal is to provide a superior alternative to OpenGL for
graphics programming on non-Windows systems, particularly Linux
and other free and open systems.
Thanks to a very clean and well-though design done from scratch,
the Direct3D 10/11 APIs are vastly better than OpenGL and can be
supported with orders of magnitude less code and development time,
as you can see by comparing the lines of code of this commit and
those in the existing Mesa OpenGL implementation.
This would have been true for the Longs Peak proposal as well, but
unfortunately it was abandoned by Khronos, leaving the OpenGL
ecosystem without a graphics API with a modern design.
A binding of Direct3D 10/11 to EGL would solve this issue in the
most economical way possible, and this would be great to provide
in Mesa, since DXGI, the API used to bind Direct3D 10/11 to Windows,
is a bit suboptimal, especially on non-Windows platforms.
Finally, a mature Direct3D 10/11 implementation is intrinsically going
to be faster and more reliable than an OpenGL implementation, thanks
to the dramatically smaller API and the segregation of all nontrivial
work to object creation that the application must perform ahead of
time.
Currently, this commit contains:
- Independently created headers for Direct3D 10, 10.1, 11 and DXGI 1.1,
partially based on the existing Wine headers for D3D10 and DXGI 1.0
- A parser for Direct3D 10/11 DXBC and TokenizedProgramFormat (TPF)
- A shader translator from TokenizedProgramFormat to TGSI
- Implementation of the Direct3D 11 core interfaces
- Automatically generated implementation of Direct3D 10 and 10.1
- Implementation of DXGI using the "native" framework of the EGL st
- Demos, usable either on Windows or on this implementation
- d3d11tri, a clone of tri
- d3d11tex, a (multi)texturing demo
- d3d11gears, an improved version of glxgears
- d3d11spikysphere, a D3D11 tessellation demo (currently Windows-only)
- A downloader for the Microsoft HLSL compiler, needed to recompile
the shaders (compiled shader bytecode is also included)
To compile this, configure at least with these options:
--with-state-trackers=egl,d3d1x --with-egl-platforms=x11
plus some gallium drivers (such as softpipe with --enable-gallium-swrast)
The Wine headers (usually from a wine-dev or wine-devel package) must
be installed.
Only x86-32 has been tested.
You may need to run "make" in the subdirectories of src/gallium/winsys/sw
and you may need to manually run "sudo make install" in
src/gallium/targets/egl
To test it, run the demos in the "progs" directory.
Windows binaries are included to find out how demos should work, and to
test Wine integration when it will be done.
Enjoy, and let me know if you manage to compile and run this, or
which issues you are facing if not.
Using softpipe is recommended for now, and your mileage with hardware
drivers may vary.
However, getting this to work on hardware drivers is also obviously very
important.
Note that currently llvmpipe is buggy and causes all 3 gears to be
drawn with the same color.
Use export GALLIUM_DRIVER=softpipe to avoid this.
Thanks to all the Gallium contributors and especially the VMware
team, whose work made it possible to implement Direct3D 10/11 much
more easily than it would have been otherwise.
Use rc_pair_ prefix for all pair instruction structs
Create a named struct for pair instruction args
Replace structs radeon_pair_instruction_{rgb,alpha} with struct
radeon_pair_sub_instruction. These two structs were nearly identical
and were creating a lot of cut and paste code. These changes are the
first step towards removing some of that code.
This allow to share code path btw old & new, also
remove check on reference this might make things
a little slower but new design doesn't use reference
stuff.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
If we are not going to write to the X or Y components of the destination
vector we also don't need to prepare to compute SIN or COS.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
When ir_binop_all_equal and ir_binop_any_nequal were introduced, the
meaning of these two opcodes changed to return vectors rather than a
single scalar, but the constant expression handling code was incorrectly
written and only worked for scalars. As a result, only the first
component of the returned vector would be properly initialized.
This reverts commit c32bac57ed
and silences the warning differently.
The _mesa_reference_texobj() function is called quite a bit and
we don't want to call valid_texture_object() all the time in non-
debug builds.
Count int and float constants independently. Since there are only
few i# constants available and hundreds of c# constants, it would
be too easy to end up with an i# declaration out of its range.
Pixel shaders do not have address registers a#, only one
loop register aL. Our only hope is to assume the address
register is in fact a loop counter and replace it with aL.
Do not translate ARL instruction for pixel shaders -- MOVA
instruction is only valid for vertex saders.
Make it more explicit relative addressing of inputs is only valid
for pixel shaders and constants for vertex shaders.
we were leaking buffers since the flush code was added, it wasn't dropping references.
move setting up flush to the set_framebuffer_state.
clean up the flush state object.
make more space in the BOs array for flushing.
This reverts commit a1d9a58b82.
Flushing the upload buffers on draw is wrong, uploads aren't supposed to
cause flushes in the first place. The real issue was
radeon_bo_pb_map_internal() not respecting PB_USAGE_UNSYNCHRONIZED.
Depth-only and stencil-only clears should mask out depth/stencil from the
output, mask out stencil/input from input, and OR or ADD them together.
However, due to a typo they were being ANDed, resulting in zeroing the buffer.
If a upload buffer is used by a previous draw that's still in the CS,
accessing it would need a context flush. However, doing a context flush when
mapping the upload buffer would then flush/destroy the same buffer we're trying
to map there. Flushing the upload buffers before a draw avoids both the CS
flush and the upload buffer going away while it's being used. Note that
u_upload_data() could e.g. use a pool of buffers instead of allocating new
ones all the time if that turns out to be a significant issue.
Silences this GCC warning.
nvfx_vertprog.c: In function 'nvfx_vertprog_translate':
nvfx_vertprog.c:998: warning: assignment discards qualifiers from pointer target type
Those have the callee field set to the null pointer, so
calling the public constructor will segfault.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Fixes this GCC warning.
lower_variable_index_to_cond_assign.cpp:
In member function
'bool variable_index_to_cond_assign_visitor::needs_lowering(ir_dereference_array*) const':
lower_variable_index_to_cond_assign.cpp:261:
warning: control reaches end of non-void function
This diagram shows the rendering pipeline with an emphasis on
the inputs/outputs for each stage. Some stages emit new vertex
attributes and others consume some attributes.
Implement the pipe_rasterizer_state::sprite_coord_enable field
in the draw module (and softpipe) according to what's specified
in the documentation.
The draw module can now add any number of extra vertex attributes
to a post-transformed vertex and generate texcoords for those
attributes per sprite_coord_enable. Auto-generated texcoords
for sprites only worked for one texcoord unit before.
The frag shader gl_PointCoord input is now implemented like any
other generic/texcoord attribute.
The draw module now needs to be informed about fragment shaders
since we need to look at the fragment shader's inputs to know
which ones need auto-generated texcoords.
Only softpipe has been updated so far.
Fixes this GCC warning.
r600_state2.c: In function 'r600_context_flush':
r600_state2.c:946: error: implicit declaration of function 'drmCommandWriteRead'
Winsys context build a list of register block a register block is
a set of consecutive register that will be emited together in the
same pm4 packet (the various r600_block* are there to provide basic
grouping that try to take advantage of states that are linked together)
Some consecutive register are emited each in a different block,
for instance the various cb[0-7]_base. At winsys context creation,
the list of block is created & an index into the list of block. So
to find into which block a register is in you simply use the register
offset and lookup the block index. Block are grouped together into
group which are the various pkt3 group of config, context, resource,
Pipe state build a list of register each state want to modify,
beside register value it also give a register mask so only subpart
of a register can be updated by a given pipe state (the oring is
in the winsys) There is no prebuild register list or define for
each pipe state. Once pipe state are built they are bound to
the winsys context.
Each of this functions will go through the list of register and
will find into which block each reg falls and will update the
value of the block with proper masking (vs/ps resource/constant
are specialized variant with somewhat limited capabilities).
Each block modified by r600_context_pipe_state_set* is marked as
dirty and we update a count of dwords needed to emit all dirty
state so far.
r600_context_pipe_state_set* should be call only when pipe context
change some of the state (thus when pipe bind state or set state)
Then to draw primitive you make a call to r600_context_draw
void r600_context_draw(struct r600_context *ctx, struct r600_draw *draw)
It will check if there is enough dwords in current cs buffer and
if not will flush. Once there is enough room it will copy packet
from dirty block and then add the draw packet3 to initiate the draw.
The flush will send the current cs, reset the count of dwords to
0 and remark all states that are enabled as dirty and recompute
the number of dwords needed to send the current context.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Currenly GLSL happily generates indirect addressing of any kind of
arrays.
Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.
This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.
Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.
Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.
Several patches in the glsl2-lower-variable-indexing were squashed
into this commit. These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
this adds the bo caching layer and uses it for vertex/index/constant bos.
ctx needs to take references on hw bos so the flushing works okay, also
needs to flush the maps.
This function is executed outside _mesa_meta_begin/end(), that means
that e.g. _mesa_meta_Bitmap() clobbers the texturing state because it
changes the currently active texture object.
There's no need to bind the new texture when it's created, it's done
again later anyway (from setup_drawpix/copypix_texture()).
Signed-off-by: Brian Paul <brianp@vmware.com>
Allows disabling various operations (mainly texture-related, but
will grow) to try & identify bottlenecks.
Unlike LP_DEBUG, this is active even in release builds - which is
necessary for performance investigation.
The current context should be notified when the the front/back buffers
of the current drawable are swapped. The notification was skipped when
xmesa_strict_invalidate is false (the default).
This fixes fdo bug #29774.
Enable some extensions now that the needed tokens are defined in
GLES/glext.h and GLES2/glext.h. Update the prototype of MultiDrawArrays
now that the prototype of _mesa_MultiDrawArraysEXT has been updated.
The shadow versions of the texture targets use an extra component
(Z) to express distance from light source to the fragment.
Fixes the shadowtex demo with llvmpipe.
Execute before folding loads, because we don't check if it's legal
in lower_mods.
Ensure that a value's insn pointer is updated when transferring it
to a different instruction.
Fix the following GCC warning.
loop_controls.cpp: In function 'int calculate_iterations(ir_rvalue*, ir_rvalue*, ir_rvalue*, ir_expression_operation)':
loop_controls.cpp:88: warning: format not a string literal and no format arguments
This fixes a DRM deadlock in the cubestorm xscreensaver, because somehow
there must not be 2 different BOs relocated in one CS if both BOs back
the same handle. I was told it is impossible to happen, but apparently
it is not, or there is something else wrong.
Since atm our OPs aren't typed but instead values are, we need to
take care if they're used as different types (e.g. a load makes a
value u32 by default).
Maybe this should be changed (also to match TGSI), but it should
work as well if done properly.
At some point we'll want to support real subroutines instead of
just inlining them into the main shader.
Since recursive calls are forbidden, we can just save all used
registers to a fixed local memory region and restore them on a
return, no need for a stack pointer.
We only uploaded up to the highest offset a program would use,
and if the constant buffer isn't changed when a new program is
used, the new program is missing the rest of them.
Might want to introduce a "fill state" for user mem constbufs.
Known issues in the ARB_color_buffer_float implementation:
- Rendering to multiple render targets, some fixed-point, some floating-point, with FIXED_ONLY fragment clamping and polygon smooth enabled may write incorrect values to the fixed point buffers (depends on spec interpretation)
- For fragment programs with ARB_fog_* options, colors are clamped before fog application regardless of the fragment clamping setting (this depends on spec interpretation)
<li>Assorted Mesa/Gallium state tracker bug fixes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26795">Bug 26795</a> - gl_FragCoord off by one in Gallium drivers.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29164">Bug 29164</a> - [GLSL 1.20] invariant variable shouldn't be used before declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29927">Bug 29927</a> - [glsl2] fail to compile shader with constructor for array of struct type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30156">Bug 30156</a> - [i965] After updating to Mesa 7.9, Civilization IV starts to show garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31923">Bug 31923</a> - [GLSL 1.20] allowing inconsistent centroid declaration between two vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32214">Bug 32214</a> - [gles2]no link error happens when missing vertex shader or frag shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32375">Bug 32375</a> - [gl gles2] Not able to get the attribute by function glGetVertexAttribfv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32541">Bug 32541</a> - Segmentation Fault while running an HDR (high dynamic range) rendering demo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32569">Bug 32569</a> - [gles2] glGetShaderPrecisionFormat not implemented yet</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32831">Bug 32831</a> - [glsl] division by zero crashes GLSL compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32910">Bug 32910</a> - Keywords 'in' and 'out' not handled properly for GLSL 1.20 shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33316">Bug 33316</a> - uniform array will be allocate one line more and initialize it when it was freed will abort</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33386">Bug 33386</a> - Dubious assembler in read_rgba_span_x86.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33388">Bug 33388</a> - Dubious assembler in xform4.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33433">Bug 33433</a> - Error in x86-64 API dispatch code.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33507">Bug 33507</a> - [glsl] GLSL preprocessor modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33508">Bug 33508</a> - [glsl] GLSL compiler modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33916">Bug 33916</a> - Compiler accepts reserved operators % and %=</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34030">Bug 34030</a> - [bisected] Starcraft 2: some effects are corrupted or too big</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34047">Bug 34047</a> - Assert in _tnl_import_array() when using GLfixed vertex datatypes with GLESv2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34114">Bug 34114</a> - Sun Studio build fails due to standard library functions not being in global namespace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34179">Bug 34179</a> - Nouveau 3D driver: nv50_pc_emit.c:863 assertion error kills Compiz</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34198">Bug 34198</a> - [GLSL] implicit sized array with index 0 used gets assertion</li>
<li><ahref="https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/691653">Ubuntu bug 691653</a> - compiz crashes when using alt-tab (the radeon driver kills it) </li>
<li><ahref="https://bugs.meego.com/show_bug.cgi?id=13005">Meego bug 13005</a> - Graphics GLSL issue lead to camera preview fail on Pinetrail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34370">Bug 34370</a> - [GLSL] "i<5 && i<4" in for loop fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34374">Bug 34374</a> - [GLSL] fail to redeclare an array using initializer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35073">Bug 35073</a> - [GM45] Alpha test is broken when rendering to FBO with no color attachment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=35483">Bug 35483</a> - util_blit_pixels_writemask: crash in line 322 of src/gallium/auxiliary/util/u_blit.c</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33512">Bug 33512</a> - [SNB] case ogles2conform/GL/gl_FragCoord/gl_FragCoord_xy_frag.test and gl_FragCoord_w_frag.test fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34280">Bug 34280</a> - r200 mesa-7.10 font distortion</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34321">Bug 34321</a> - The ARB_fragment_program subset of ARB_draw_buffers not implemented</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36410">Bug 36410</a> - [SNB] Rendering errors in 3DMMES subtest taiji</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36527">Bug 36527</a> - [wine] Wolfenstein: Failed to translate rgb instruction.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=36651">Bug 36651</a> - mesa requires bison and flex to build but configure does not check for them</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30694">Bug 30694</a> - wincopy will crash on Gallium drivers when going to front buffer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30787">Bug 30787</a> - Invalid asm shader does not generate draw-time error when used with GLSL shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31101">Bug 31101</a> - [glsl2] abort() in ir_validate::visit_enter(ir_assignment *ir)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31193">Bug 31193</a> - [regression] aa43176e break water reflections</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31194">Bug 31194</a> - The mesa meta save/restore code doesn't ref the current GLSL program</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31514">Bug 31514</a> - isBuffer returns true for unbound buffers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31560">Bug 31560</a> - [tdfx] tdfx_tex.c:702: error: ‘const struct gl_color_table’ has no member named ‘Format’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31617">Bug 31617</a> - Radeon/Compiz: 'failed to attach dri2 front buffer', error case not handled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31650">Bug 31650</a> - [GLSL] varying gl_TexCoord fails to be re-declared to different size in the second shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31673">Bug 31673</a> - GL_FRAGMENT_PRECISION_HIGH preprocessor macro undefined in GLSL ES</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31690">Bug 31690</a> - i915 shader compiler fails to flatten if in Aquarium webgl demo.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31832">Bug 31832</a> - [i915] Bad renderbuffer format: 21</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31985">Bug 31985</a> - [GLSL 1.20] initialized uniform array considered as "unsized"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31987">Bug 31987</a> - [gles2] if input a wrong pname(GL_NONE) to glGetBoolean, it will not case GL_INVALID_ENUM</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32311">Bug 32311</a> - [965 bisected] Array look-ups broken on GM45</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32520">Bug 32520</a> - [gles2] glBlendFunc(GL_ZERO, GL_DST_COLOR) will result in GL_INVALID_ENUM</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32825">Bug 32825</a> - egl_glx driver completely broken in 7.9 branch [fix in master]</li>
</ul>
<h2>Changes</h2>
<p>The full set of changes can be viewed by using the following GIT command:</p>
<pre>
git log mesa-7.9..mesa-7.9.1
</pre>
<p>Alex Deucher (5):
<ul>
<li>r100: revalidate after radeon_update_renderbuffers</li>
<li>r600c: add missing radeon_prepare_render() call on evergreen</li>
<li>r600c: properly align mipmaps to group size</li>
<li>gallium/egl: fix r300 vs r600 loading</li>
<li>r600c: fix some opcodes on evergreen</li>
</ul></p>
<p>Aras Pranckevicius (2):
<ul>
<li>glsl: fix crash in loop analysis when some controls can't be determined</li>
<li>glsl: fix matrix type check in ir_algebraic</li>
</ul></p>
<p>Brian Paul (27):
<ul>
<li>swrast: fix choose_depth_texture_level() to respect mipmap filtering state</li>
<li>st/mesa: replace assertion w/ conditional in framebuffer invalidation</li>
<li>egl/i965: include inline_wrapper_sw_helper.h</li>
<li>mesa: Add missing else in do_row_3D</li>
<li>mesa: add missing formats in _mesa_format_to_type_and_comps()</li>
<li>mesa: handle more pixel types in mipmap generation code</li>
<li>mesa: make glIsBuffer() return false for never bound buffers</li>
<li>mesa: fix glDeleteBuffers() regression</li>
<li>swrast: init alpha value to 1.0 in opt_sample_rgb_2d()</li>
<li>meta: Mask Stencil.Clear against stencilMax in _mesa_meta_Clear</li>
<li>st/mesa: fix mapping of zero-sized buffer objects</li>
<li>Assorted Mesa/Gallium state tracker bug fixes</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26795">Bug 26795</a> - gl_FragCoord off by one in Gallium drivers.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29164">Bug 29164</a> - [GLSL 1.20] invariant variable shouldn't be used before declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=29927">Bug 29927</a> - [glsl2] fail to compile shader with constructor for array of struct type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=30156">Bug 30156</a> - [i965] After updating to Mesa 7.9, Civilization IV starts to show garbage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=31923">Bug 31923</a> - [GLSL 1.20] allowing inconsistent centroid declaration between two vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32214">Bug 32214</a> - [gles2]no link error happens when missing vertex shader or frag shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32375">Bug 32375</a> - [gl gles2] Not able to get the attribute by function glGetVertexAttribfv</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32541">Bug 32541</a> - Segmentation Fault while running an HDR (high dynamic range) rendering demo</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32569">Bug 32569</a> - [gles2] glGetShaderPrecisionFormat not implemented yet</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32831">Bug 32831</a> - [glsl] division by zero crashes GLSL compiler</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=32910">Bug 32910</a> - Keywords 'in' and 'out' not handled properly for GLSL 1.20 shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33316">Bug 33316</a> - uniform array will be allocate one line more and initialize it when it was freed will abort</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33386">Bug 33386</a> - Dubious assembler in read_rgba_span_x86.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33388">Bug 33388</a> - Dubious assembler in xform4.S</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33433">Bug 33433</a> - Error in x86-64 API dispatch code.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33507">Bug 33507</a> - [glsl] GLSL preprocessor modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33508">Bug 33508</a> - [glsl] GLSL compiler modulus by zero crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=33916">Bug 33916</a> - Compiler accepts reserved operators % and %=</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34047">Bug 34047</a> - Assert in _tnl_import_array() when using GLfixed vertex datatypes with GLESv2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34114">Bug 34114</a> - Sun Studio build fails due to standard library functions not being in global namespace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=34198">Bug 34198</a> - [GLSL] implicit sized array with index 0 used gets assertion</li>
<li><ahref="https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/691653">Ubuntu bug 691653</a> - compiz crashes when using alt-tab (the radeon driver kills it) </li>
<li><ahref="https://bugs.meego.com/show_bug.cgi?id=13005">Meego bug 13005</a> - Graphics GLSL issue lead to camera preview fail on Pinetrail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=22622">Bug 22622</a> - [GM965 GLSL] noise*() cause GPU lockup</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=23743">Bug 23743</a> - For loop from 0 to 0 not optimized out</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=24553">Bug 24553</a> - shader compilation times explode when using more () pairs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25664">Bug 25664</a> - [GLSL] re-declaring an empty array fails to compile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25769">Bug 25769</a> - [GLSL] "float" can be implicitly converted to "int"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25808">Bug 25808</a> - [GLSL] const variable is modified successfully</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25826">Bug 25826</a> - [GLSL] declaring an unsized array then re-declaring with a size fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25827">Bug 25827</a> - [GLSL] vector constructor accepts too many arguments successfully</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25829">Bug 25829</a> - [GLSL] allowing non-void function without returning value</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25830">Bug 25830</a> - [GLSL] allowing non-constant-expression as const declaration initializer</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25877">Bug 25877</a> - [GLSL 1.10] implicit conversion from "int" to "float" should not be allowed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25878">Bug 25878</a> - [GLSL] sampler is converted to int successfully</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25994">Bug 25994</a> - [GM45][GLSL] 'return' statement in vertex shader unsupported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=25999">Bug 25999</a> - [GLSL] embedded structure constructor fails to compile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26000">Bug 26000</a> - [GLSL] allowing different parameter qualifier between the function definition and declaration</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26001">Bug 26001</a> - [GLSL 1.10] constructing matrix from matrix succeeds</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26224">Bug 26224</a> - [GLSL] Cannot get location of a uniform struct member</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=26990">Bug 26990</a> - [GLSL] variable declaration in "while" fails to compile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=27060">Bug 27060</a> - [965] piglit glsl-fs-raytrace failure due to lack of function calls.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=27216">Bug 27216</a> - Assignment with a function call in an if statement causes an assertion failure</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=27261">Bug 27261</a> - GLSL Compiler fails on the following vertex shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=27265">Bug 27265</a> - GLSL Compiler doesnt link the attached vertex shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=28834">Bug 28834</a> - Add support for system fpclassify to GL_OES_query_matrix function for OpenBSD / NetBSD</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=28837">Bug 28837</a> - varying vec4 index support</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=28845">Bug 28845</a> - The GLU tesselator code has some warnings</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=28889">Bug 28889</a> - [regression] wine game crash</li>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.