If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A few GLES2 tests tripped over this when using array dereferences to
hit channels on the LHS (see piglit test
glsl-copy-propagation-vector-indexing). We wouldn't find the
ir_dereference_variable, and assume that that meant that it wasn't an
assignment to a scalar/vector, and thus not notice that the variable
had been changed.
Release the old depth region and reference the new one *only* if it has
changed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@intel.com>
Prior to Gen6, we use the GS for breaking down quads, quad-strips,
and line loops. On Gen6, earlier stages already take care of this,
so we never need the GS.
Since this code is likely completely untested, remove it for now.
We can write new code when enabling real geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit b4cbd2b312.
It looked like a safe sanity check. It missed the issue of the start of
the buffer not being at 0, but even that was not enough to explain why
setting the max vertex index caused glyphs to be dropped from the game
'Achron'.
Instead, the issue appears to be related to the use of the vertex bias
and so we would need to re-emit the max-index every time we adjusted the
bias, so re-emitting the relocations and defeating the original
optimisation.
Reported-and-tested-by: Thomas Jones <thomas.jones@utoronto.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35163
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the window is minimized GetClientRect will return zeros.
Instead of creating a 1x1 framebuffer, simply preserve the current window
size, until the window is restored or maximized again.
The earlier change to ensure rendertargets and textures are always
rebound at every command buffer start had the downside of making
successive flushes no longer no-ops, as a command buffer with merely
the rebinding commands were being unnecessarily sent to the vGPU.
This change only re-emits the bindings when necessary, by keeping track
of the need to rebind outside of the dirty state update mechanism.
According to https://bugs.freedesktop.org/show_bug.cgi?id=34280
commit 5d1387b2da causes the font corruption
problems people have been seeing under various apps and gnome-shell on r200
cards.
This commit changed (loosened) the check for using the memcpy path in the
former al88 / al1616 texstore functions, which are now also used to
store rg texures. This patch restores the old strict check in case of
al textures. I've no idea why this fixes things, since I don't know the
code in question at all. But after seeing the bisect in bfdo34280 point
to this commit, I gave this fix a try and it fixes the font issues seen on
r200 cards.
[airlied:
r200 has no native working A8, so it does an internal storage format of AL88
however srcFormat == internalFormat == ALPHA when we get to this point,
so it copies, but it wants to store into an AL88 not ALPHA so fails,
I'll also push a piglit test for this on r200].
Many thanks to Nicolas Kaiser who did all the hard work of tracking this down!
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also make the GL_ARB_draw_instanced block follow the same pattern as
the other blocks.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a 49.6% +/- 2.0% (n=9, IPS outlier removed) performance
improvement for the hacked-up-for-cache-misses scissor-many, and no
statistically significant performance difference for the
hacked-up-for-cache-hits version (n=9, IPS outlier removed). No
statistically significant performance difference from ETQW (n=5) from
these last two commits.
This is a 28.1% +/- 1.4% (n=10) performance improvement for the
hacked-up-for-cache-misses scissor-many (n=10), and no statistically
significant wall-time performance difference for the
hacked-up-for-cache-hits version (n=9, first outlier in each removed
since IPS was warming up. User time increased by about 4.7%, but
kernel time decreased equivalently).
Build the new sources, plug the new functions into the dispatch table,
implement display list support. And enable extension in the gallium
state tracker.
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
Fixes issues in e.g. nexuiz (desertfactory) or supertuxkart that
look like lighting bugs.
They're not visible with the software rasterizers because their
notion of linear interpolation seems to be different from that
of nv50/nvc0.
This reverts commit 437c748bf5.
The commit is wrong for several reasons. One of them is when we grab
a new buffer, we should update all the states it is bound in,
including all parallel contexts. I don't think this is even doable.
The correct solution would be upload data via a temporary buffer and
do resource_copy_region to the original one.
https://bugs.freedesktop.org/show_bug.cgi?id=36088
Add Cygwin platform-specific settings and drivers to build for dri driver:
- by default, disable direct rendering.
- if direct rendering is enabled, the swrast dridriver is the only one it's
sensible to try to build (this doesn't work at the moment as additional patches
are required to build a libGL which can load just swrast without the DRM headers,
even though there's no actual functional dependency)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Fix build when configured --with-driver=dri --disable-driglx-direct on targets
without drm e.g. GNU/Hurd and Cygwin
Based on the Debian patch file '05_hurd-ftbfs.diff' by Samuel Thibault.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-By: Jakob Bornecrantz <wallbraker@gmail.com>
The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.
Fixes glsl-fs-loop-redundant-condition.
Otherwise min_lod can potentially be larger than the clamped max_lod. The
code that follows will swap min_lod and max_lod in that case, resulting in a
max_lod larger than MAX_LEVEL.
Signed-off-by: Brian Paul <brianp@vmware.com>
Base level and min LOD aren't equivalent. In particular, min LOD has no
effect on image array selection for magnification and non-mipmapped
minification.
Signed-off-by: Brian Paul <brianp@vmware.com>
Although for GL a zero stride means tightly packed elements, Mesa
internally uses zero strides for constant arrays.
Therefore user buffers need to be defined from
buffer_offset + src_offset + min_index*stride
to
buffer_offset + src_offset + max_index*stride + elem_size
Simplifying the later with (max_index + 1)*stride will give zero
sized buffers.
This change also aggregates the st_context's info about user buffers
into a single array.
We adjust 'end' to fit into _MaxElement, but that may result into a 'start'
value bigger than 'end' being passed downstream, causing havoc.
This could be seen with arb_robustness_draw-vbo-bounds, due to an
application bug.
Because:
- bindings are not fully automatic, and they are broken most of the time
- unit tests/samples can be written in C on top of graw
- tracing/retracing is more useful at API levels with stable ABIs such as
GL, producing traces that cover more layers of the driver stack and and
can be used for regression testing
There's really no need for a prefix on member functions, and overloading
takes care of the _op1/_op2 distinction quite nicely. Eric already made
a similar change in the i965 FS backend.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than ir_to_mesa_dst_reg_from_src and ir_to_mesa_src_reg_from_dst.
The new constructors are marked 'explicit' so that the compiler can
catch cases where source and destination registers were accidentally
interchanged.
This also necessitated using constructors to initialize the undef and
address registers, as well as adding a default constructor.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both classes are completely private to ir_to_mesa.cpp, so there won't be
any name conflicts with other parts of Mesa. The prefix simply makes it
harder to read.
Also, use a class rather than typedef structs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is in preparation from removing the "ir_to_mesa_" prefix on the
src_reg and dst_reg types, which would cause a naming conflict.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code would previously handle the projection, then swizzle the
shadow comparitor into place. However, when the projection is done
"by hand," as in the TXB case, the unprojected shadow comparitor would
over-write the projected shadow comparitor.
Shadow comparison with projection and LOD is an extremely rare case in
real application code, so it shouldn't matter that we don't handle
that case with the greatest efficiency.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=32395
This matches the behaviour below when numSamples is compared.
At least with the gallium state tracker this can actually occur if st_render_texture fails.
Signed-off-by: Brian Paul <brianp@vmware.com>
This is always the way with real hardware and desktop OpenGL. Some
hardware can't do some formats natively. The alpha-only, luminance,
and intensity formats are usually the most problematic. Some sized
formats can also be problematic. This patch provides fall-back
formats for those that are not natively supported.
At some point it would be interesting to try providing
device-independent conversions using EXT_texture_swizzle. The drivers
that support EXT_texture_swizzle could, for example, see
GL_LUMINANCE16_SNORM as MESA_FORMAT_SIGNED_R16 with a { r, r, r, 1 }
swizzle. Care would need to be taken to prevent issues with using
those textures for FBO rendering.
This is the rest of the fix for glean's pixelFormats test on i965.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This class of hardware can natively sample all of the snorm surface
formats that DX10 requires, but it can't do some of the legacy GL
formats. In particular, all of the alpha, luminance, and intensity
formats are unsupported.
This partially fixes the breakage in glean's pixelFormats test since
GL_EXT_texture_snorm support was added to Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use this format to represent the accum buffer. No snorm texture
sampling or rendering takes place.
Fixes failed assertion with swrast and any app using the accum buffer
(and glxinfo).
Piglit tests:
- glsl-fs-shadow2d-01
- glsl-fs-shadow2d-02
- glsl-fs-shadow2d-03
- fs-shadow2d-red-01
- fs-shadow2d-red-02
- fs-shadow2d-red-03
NOTE: This is a candidate for the stable branches.
Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.
The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.
NOTE: This is a candidate for the 7.10 branch.
Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
running gnome-shell or compiz (Mesa bugs #35820 and #35853).
I incorrectly refactored the case that dealt with ARF_NULL; even in that
case, the source register needs to be changed to the MRF.
NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
cherry-picked, take this one too).
Branch emulation and loop unrolling are done in the GLSL frontend.
Transforming loops is no longer needed for fragment shaders, but it is still
necessary for vertex shaders.
Registers that are used inside of loops need to be considered live
starting with the first instruction of the outermost loop.
https://bugs.freedesktop.org/show_bug.cgi?id=34370
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Oops, the mask was being used in the loop to determine whether to use
include the stencil || depth values. This began to fail when mask was
cleared at the beginning of the loop. So reorder the tests and do the
work up-front along with determining the depth_stencil value to use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35822
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(some little changes by Marek Olšák)
Squashed commit of the following:
commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Fri Aug 27 02:13:57 2010 +0200
draw: disable SSE and PPC paths (use LLVM instead)
These paths don't support vertex clamping, and are anyway
obsoleted by LLVM.
If you want to re-enable them, add vertex clamping and test that it
works with the ARB_color_buffer_float piglit tests.
commit fed3486a7ca0683b403913604a26ee49a3ef48c7
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:27:38 2010 +0200
draw_llvm: respect vertex color clamp
commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:26:43 2010 +0200
draw: respect vertex clamping in interpreter path
Macro can lead to hard to debug list bugs. For instance consider
the following :
LIST_ADD(item, list->prev)
3 instruction of the macro became :
(list->prev)->next->prev = item
which is equivalent to :
list->prev = item
Thus list prev field changes and next instruction in the macro
(list->prev)->next = item
became :
item->next = item
And you endup with list corruption, other case lead to similar
list corruption. Inline function are not affected by this short
coming
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When the condition
min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])
was true, we were wrongly ignoring istart and processing indices 0.
Reorder some statements to make the code easier to understand.
Now that we purposefully generate delta that point outside of the target
buffer, the assertion has outlived its usefulness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more! This time without the unwarranted conversion from
drm_intel_bo_alloc_tiled.
Signed-off-by: [a very embarrassed] Chris Wilson <chris@chris-wilson.co.uk>
v2: Allocate the fences from a single shared buffer object.
v3: Allocate the r600_fence structs in blocks of 16.
Spin a few times before calling sched_yield in r600_fence_finish().
This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.
This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This array is going to be used in the main compiler soon. Leaving
them uniforms.c caused problems for building the stand-alone compiler.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The hardware should be set according to this table:
FORMAT -> R300 COLORFORMAT
-------------------------
X16 -> UV88
X16Y16 -> ARGB8888
X32 -> ARGB8888
X32Y32 -> ARGB16161616
US_OUT_FMT must contain the real format.
I wasn't able to make B3G3R2 and L4A4 work, but those aren't important.
The vertex color clamp control is a property of an API,
a lot like gl_rasterization_rules.
The state should be set according to the API being implemented, for example:
OpenGL Compatibility: enabled by default
OpenGL Core: disabled by default
D3D11: always disabled
This patch also changes the way ARB_color_buffer_float is advertised.
If no SNORM or FLOAT render target is supported, fragment color clamping
is not required.
BTW this changes the gallium interface.
Some rather cosmetic changes by Marek.
Squashed commit of the following:
commit 513b37d484f0318311e84bb86ed4c93cdff71f13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:54 2010 +0200
mesa/st: respect fragment clamping in st_DrawPixels
commit 546a31e42cad459d7a7a10ebf77fc5ffcf89e9b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:28 2010 +0200
mesa/st: support fragment and vertex color clamping
commit c406514a1fbee6891da4cf9ac3eebe4e4407ec13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:56:37 2010 +0200
mesa/st: expose ARB_color_buffer_float if unclamping is supported
commit d0c5ea11b6f75f3da2f4ca989115f150ebc7cf8d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:53:41 2010 +0200
mesa/st: use unclamped colors
This assumes that Gallium is to be interpreted as given drivers the
responsibility to clamp these colors if necessary.
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it. (see the removed XXX comment. -Marek)
TODO: did I get the set of operations mandating it right?
commit 76bdfcfe3ff4145a1818e6cb6e227b730a5f12d8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:18:25 2010 +0200
gallium: add color clamping to the interface
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: fix getteximage so that it doesn't clamp values
mesa: update the compute_version function
mesa: add display list support for ARB_color_buffer_float
mesa: fix glGet query with GL_ALPHA_TEST_REF and ARB_color_buffer_float
commit b2f6ddf907935b2594d2831ddab38cf57a1729ce
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 31 16:50:57 2010 +0200
mesa: document known possible deviations from ARB_color_buffer_float
commit 5458935be800c1b19d1c9d1569dc4fa30a97e8b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:54:56 2010 +0200
mesa: expose GL_ARB_color_buffer_float
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
(I'll squash the st/mesa part to a separate commit. -Marek)
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it.
TODO: did I get the set of operations mandating it right?
commit 3a9cb5e59b676b6148c50907ce6eef5441677e36
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:09:41 2010 +0200
mesa: respect color clamping in texenv programs (v2)
Changes in v2:
- Fix attributes other than vertex color sometimes getting clamped
commit de26f9e47e886e176aab6e5a2c3d4481efb64362
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:05:53 2010 +0200
mesa: restore color clamps on glPopAttrib
commit a55ac3c300c189616627c05d924c40a8b55bfafa
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:04:26 2010 +0200
mesa: clamp color queries if and only if fragment clamping is enabled
commit 9940a3e31c2fb76cc3d28b15ea78dde369825107
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 00:00:16 2010 +0200
mesa: introduce derived _ClampXxxColor state resolving FIXED_ONLY
To do this, we make ClampColor call FLUSH_VERTICES with the appropriate
_NEW flag.
We introduce _NEW_FRAG_CLAMP since fragment clamping has wide-ranging
effects, despite being in the Color attrib group.
This may be easily changed by s/_NEW_FRAG_CLAMP/_NEW_COLOR/g
commit 6244c446e3beed5473b4e811d10787e4019f59d6
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:58:24 2010 +0200
mesa: add unclamped color parameters
Fixes piglit test glsl-vs-arrays-3 on Sandybridge, as well as garbage
rendering in 3DMarkMobileES 2.0's Taiji demo and GLBenchmark 2.0's
Egypt and PRO demos.
NOTE: This a candidate for stable release branches. It depends on
commit 9a21bc6401.
This was open-coded in three different places, and more are necessary.
Extract this into a function so it can be reused.
Unfortunately, not all variations were the same: in particular, one set
compression control and checked that the source register was not
ARF_NULL. This seemed like a good idea, so all cases now do so.
Eventually the miptree refcounting interface should be cleaned up.
The assymmetry dramatically increases the probability of bugs like
this. It should be made to like like libdrm refcounting or the
refcounting style used in other parts of Mesa.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, this ensures things like the front buffer actually exist. This
fixes piglt fbo/fbo-sys-blit and fd.o bug 35483.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
GLSL 1.30 states clearly that only float and int are allowed, while the
GLSL ES specification's issues section states that sampler types may
take precision qualifiers.
Fixes compilation failures in 3DMarkMobileES 2.0 and GLBenchmark 2.0.
NOTE: This is a candidate for stable release branches.
Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.
Found by inspection (comparing the old and new FS backend code).
References: https://bugs.freedesktop.org/show_bug.cgi?id=32949
NOTE: This is a candidate for the 7.10 branch.
Since GLSL IR allows multiple ir_variables to share the same name, we
need to generate unique names when printing the IR. Previously, we
always used %s@%p, appending the ir_variable's memory address.
While this worked, it had two drawbacks:
- When there aren't duplicates, the extra "@0x669a3e88" is useless
and makes the code harder to read.
- Real duplicates were hard to tell apart:
channel_expressions@0x6699e3c8 vs. channel_expressions@0x6699ddd8
We now append @2, @3, @4, and so on, but only where necessary to
distinguish duplicates. Since we only do this at print time, any
performance impact is irrelevant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
While making some other changes in this area I was finding it annoying
each of these functions took mostly the same set of parameters in
differing orders.
'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
We were using typecasts because the functions pointers are polymorphic
in the second argument type, which.
Surprisingly the wrong calling convention didn't cause crashes on Windows,
but it was causing certain registers to be trashed in MSVC optimized
builds, when processing callists in the ClearView RC Flight Simulator.
I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
If set to year X, only report extensions up to that year. This is a
work-around for games that try to copy the extensions string to a fixed
size buffer and overflow. If a game was released in year X, setting
MESA_EXTENSION_MAX_YEAR to that year will likely fix the problem.
We now use a 4-bit writemask for all instruction types, which makes it
easier to write generic helper functions to manipulte writemasks.
NOTE: This is a candidate for the 7.10 branch.
The code no longer supports otherwise -- it relies on buffers being
uploaded via u_upload_mgr -- so make this clear.
Also, there's no need to flush after draws from user buffers, given all
user content should have been copied by then.
Unsigned long is 32bit on several platforms (e.g., Windows), yielding
1UL << 32 to be zero.
Note that BITFIELD64_BIT result is often assigned to variables of type
GLbitfield, instead of GLbitfield64. That's probably wrong and should be
addressed in a later change.
It should have been a tip when the spec says "However, implicitly
sized arrays cannot be assigned to. Note, this is a rare case that
*initializers and assignments appear to have different semantics*."
(empahsis mine)
Fixes bugzilla #34367.
NOTE: This is a candidate for stable release branches.
Early Z support is set in the DST_VARS command. Hence split up static
state emission to avoid reissuing to much on fragment shader changes,
especially the costly dst buffer relocations.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The optimization loop won't reinsert noise instructions or quadop
vectors, so we were traversing the tree for nothing. Lowering vector
indexing was in the loop after do_common_optimization() to avoid the
work if it ended up that the index was actually constant, but that has
been called already in the core.
Most of the time we don't have a non-uniform struct variable in the
shader, so this cuts the time spent in do_structure_splitting during
glean texCombine by about 2/3.
This fixes an issue where the .obj files wound up in the src/
directory rather than the build/ directory. That prevented
combined 32-bit and 64-bit builds from working.
Signed-off-by: Brian Paul <brianp@vmware.com>
Additionally, to discarding the whole buffer, use
PIPE_TRANSFER_DISCARD_RANGE in pipe_buffer_write when the
write covers only part of the buffer.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In memory mapping buffer objects make use of
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE and PIPE_TRANSFER_DISCARD_RANGE
when appropriate.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It doesn't support them. Also, we shouldn't be
emitting CB_BLENDx_CONTROL on R600 as the regs don't
exist there, but I'm not sure of the best way to deal
with this in the current r600 winsys.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This packet is required when updating the DB, CB,
or STRMOUT base addresses on rv6xx for the surface
sync logic to work correctly.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This sort of worked because blend state setup cleared MULTIWRITE_ENABLE again,
but that's not something we want to depend on.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Disable Z_EXPORT / STENCIL_EXPORT / KILL_ENABLE again if a shader doesn't
use those. This is similar to 0a6f09a76a.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The idea behind this is that anything touching registers should be in
r600_state.c or evergreen_state.c. This is also consistent with
evergreen_pipe_shader_vs().
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
We need more and more of these, and it is difficult and prone to version
incompatability issues trying to single out every one of them.
This mimicks what was done in SCons.
The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered
dead by the reassignment on line 392.
Signed-off-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
This was probably missed when implementing luminance and luminance alpha
render targets.
_mesa_get_format_bits checks for both GL_*_BITS and GL_TEXTURE_*_SIZE.
This fixes:
main/framebuffer.c:892: _mesa_source_buffer_exists: Assertion `....' failed.
The r300 compiler can eliminate unused uniforms and remap uniform locations
if their number surpasses hardware limits, so the limit is actually
NumParameters + NumUnusedParameters. This is important for some apps
under Wine to run.
Wine sometimes declares a uniform array of 256 vec4's and some Wine-specific
constants on top of that, so in total there is more uniforms than r300 can
handle. This was the main motivation for implementing the elimination
of unused constants.
We should allow drivers to implement fail & recovery paths where it makes
sense, so giving up too early especially when comes to uniforms is not
so good idea, though I agree there should be some hard limit for all drivers.
This patch fixes:
- glsl-fs-uniform-array-5
- glsl-vs-large-uniform-array
on drivers which can eliminate unused uniforms.
Avoid setting the same gpu register several times in a r600_pipe_state.
Compute the final value of the register and set that one time. This avoids
some overhead in r600_context_pipe_state_set().
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Well, not sure what exactly it is, but it certainly doesn't contain
the control flow stack, but vertex data.
Not sure about size, I've only seen the first few KiB written, but
the binary driver seems to allocate more.
Depth values are also written before the shader is executed, so if
early tests are enabled, fragments that failed the alpha test were
modifying the depth buffer, but they shouldn't.
The kernel drm takes care of all coherency as long as we don't forget
to submit all outstanding commands in the batchbuffer ...
Also move batchbuffer initialization up because otherwise transfers
for some helper textures fail with a segmentation fault.
And kill the dead code, flushes should now be correct everywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The docs say it can be set for direct texture lookups, but even that
causes problems.
This fixes the wireframe bug:
https://bugs.freedesktop.org/show_bug.cgi?id=32688
NOTE: This is a candidate for the 7.9 and 7.10 branches.
(1, -_, ...) was converted to (-1, ...) because of the negation
in the second component.
Masking out the unused bits fixes this.
Piglit:
- glsl-fs-texture2d-branching
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This gets one more piece of the pipeline onto the new codegen backend.
Once ARB_fragment_program can generate GLSL programs, we can nuke the
old backend.
This is like how we track FragmentProgram._Current for the computed
ARB fragment program for fixed function texenv, but this gives direct
access to the gl_shader_program for drivers to codegen from, skipping
ARB_fp.
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
It would be nice if we handled optimized uniform math like this in
some generic way, since people often end up doing uniform expressions
in shaders, but for now keep this hard-coded like it was in the
texenvprogram code.
For fixed function fragment processing in GLSL IR, we want to be able
to reference this state value. gl_* not explicitly permitted is
reserved, so using this variable name internally shouldn't be any
issue.
This is used in the upcoming fixed function shader_program generation,
and shader_program and ARB programs are together in this code until
both fragment and vertex ff get converted.
The drivers have been changed so that they behave as if all of the flags
were set. This is already implicit in most hardware drivers and required
for multiple contexts.
Some state trackers were also abusing the PIPE_FLUSH_RENDER_CACHE flag
to decide whether flush_frontbuffer should be called.
New flag ST_FLUSH_FRONT has been added to st_api.h as a replacement.
Since we're compiling/linking GLSL shaders we should check against
the shader uniform limits, not the legacy vertex/fragment program
parameter limits which are usually lower.
The gl_program_constants struct is for limits that are applicable to
any/all shader stages. Move the geometry shader-only fields into the
gl_constants struct.
Remove redundant MaxGeometryUniformComponents field too.
Need to flush rendering (or at least indicate that the rug might be getting
pulled out from underneath us) when a shader, buffer object or query object
is about to be deleted.
Also, this helps to tell the VBO module to unmap its current vertex buffer.
Benefits:
- spares us a relocation.
- needed for zone rendering (if that ever happens).
- just awesome.
v2: Rename the debug option. Completely disabling the blitter is
required for Y tiling to work, so this option will cover other
code paths in the future.
v3: Implement suggestions by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Flushing the batch/hw backend doesn't invalidate the derived state.
So kill the unnecessary function calls and add an assert in
emit_hardware_state for paranoia.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the new clear code this is possible (if the app call glClear
before drawing the first primitive).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The polygon stipple fallback does not have to be implemented in the
draw module (it doesn't need window coords, etc). Drivers can use
this utility and avoid sw vertex fallbacks if pstipple is enabled.
Note: this is WIP and not used by any driver yet.
The gc var would be NULL if called from line 238. Instead, get
the opcode from __glXSetupForCommand(dpy) as done in other places.
And remove the unused gc parameter.
Fixes a bug reported by "John Doe" on 3/9/2011.
NOTE: This is a candidate for the 7.10 branch.
This fd gets passed in from outside, closing it causes the X.org server
to crap out when the driver doesn't identify the chipset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
Add linear probing on collisions.
Expand entry array by a fixed scale (currently 2) to help avoid
collisions.
Use a LRU approach to ensure that the number of entries stored in the
cache doesn't exceed the requested size.
Do that later on when we set up the hwtnl state instead.
This addresses a problem when we drop the uploaded copy due to a vb
size change, it will remain referenced in svga->curr.vb[], and the
new contents of the vb will never be uploaded.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The encoding/decoding algorithms are shared with RGTC.
Thanks to some magic with the base format, the RGTC texstore functions work
for LATC too.
swrast passes the related piglit tests besides two things:
- The alpha channel is wrong (it's always 1), however the incorrect alpha
channel makes some other tests fail too, so I guess it's unrelated to LATC.
- Signed LATC fetches aren't correct yet (signed values are clamped to [0,1]),
however RGTC has the same problem.
Further testing (with other of my patches) shows that hardware drivers
and softpipe work.
BTW, ETQW uses this extension.
st->user_vb[attr] was always pointing to the same user vb, regardless
of the value of attr. Together with reverting the temporary workaround
for bug 34378, and a fix in the svga driver, this fixes googleearth on svga.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The signature list in a function must contain only ir_function_signature nodes.
The target of an ir_call must be an ir_function_signature.
These were added while trying to debug Mesa bugzilla #34203.
Instead of using the current gl_fragment_program. These aren't necessarily
the same, for example when translate_program() is called by
i915ValidateFragmentProgram().
this doesn't bind to drivers yet, just enough to in theory make indirect
work against other servers.
I'm really not sure what the rules for adding extensions to the known_gl_extensions list as it looks to be missing a few. are these GL extensions that have GLX
protocol??
Signed-off-by: Dave Airlie <airlied@redhat.com>
Comments about the deleted stuff:
- openaren hang: likely caused by the vertex corruptions, fixed by Jakob.
- tiling: Y-tiling works with my hw-clear branch. X-tiling works as
merged to master a while ago (execbuf2 version).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ARB_instanced_arrays is a subset of D3D9.
ARB_draw_instanced is a subset of D3D10.
The point of this change is to allow D3D9-level drivers to enable
ARB_instanced_arrays without ARB_draw_instanced.
If an array redeclaration includes an initializer, the initializer
would previously be dropped on the floor. Instead, directly apply the
initializer to the correct ir_variable instance and append the
generated instructions.
Fixes bugzilla #34374 and piglit tests glsl-{vs,fs}-array-redeclaration.
NOTE: This is a candidate for stable release branches. 0292ffb8 and
8e6cb9fe are also necessary.
Removed sampler view support for USCALED/SSCALED, the texture unit
refuses to convert to non-normalized float. The enums are treated
like UNORM.
Removed duplicate format related headers.
The hw doesn't like it - demos/shadowtex is broken. The emitted shader
isn't totally empty though, the depth write fixup gets emitted instead.
Maybe that one is somewhat fishy, too?
Idea for this patch from Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
https://bugs.freedesktop.org/show_bug.cgi?id=29172
Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper)
Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
Because the format can be changed to UNORM in a surface.
This fixes:
state_tracker/st_atom_framebuffer.c:163:update_framebuffer_state:
Assertion `framebuffer->cbufs[i]->texture->bind & (1 << 1)' failed.
Computation of the delta of this array from the last had a silly little
bug and ignored any initial delta==0 causing grief in Nexuiz and
friends.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is a silicon bug which causes unpredictable behaviour if the
URB_FENCE command should cross a cache-line boundary. Pad before the
command to avoid such occurrences. As this command only applies to
gen4/5, do the fixup unconditionally as the specs do not actually state
for which chip it was fixed (and the cost is negligible)...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, the rule deleted by this commit was matched every single
time (being the longest match). If not skipping, it used REJECT to
continue on to the actual correct rule.
The flex manual advises against using REJECT where possible, as it is
one of the most expensive lexer features. So using it on every match
seems undesirable. Perhaps more importantly, it made it necessary for
the #if directive rules to contain a look-ahead pattern to make them
as long as the (now deleted) "skip the whole line" rule.
This patch introduces an exclusive start state, SKIP, to avoid REJECTs.
Each time the lexer is called, the code at the top of the rules section
will run, implicitly switching the state to the correct one.
Fixes piglit tests 16384-consecutive-chars.frag and
16385-consecutive-chars.frag.
The expected result has been out of sync with what glcpp produces for
some time; glcpp's actual result seems to be correct and is very close to
GCC's cpp. Updating this will make it easier to catch regressions in
upcoming commits.
PIPE_BIND_CONSTANT_BUFFER alone was OK for nv50/nvc0, but nv30 will need
to be able to set others on certain chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit is basically a copy-over of the fix
Chia-I Wu's commited to wayland:
http://cgit.freedesktop.org/wayland/wayland-demos/commit/?id=1b6c0ed95
"Workaround an xcb-dri2 bug.
xcb_dri2_connect_device_name generated by xcb-proto 1.6 is broken.
It only works when the length of the driver name is a multiple of 4."
This reverts commit 1f9a0a4e6e.
This caused trouble with Lightsmark w/ i965 driver and fbo/fbo-blit-d24s8
(see bug 34894). It's probably something simple but no time to debug now.
SNB has 64k urb space, we only use piece of them.
The more urb space we alloc,
the more concurrent vs threads we can run.
push the urb space usage to the limit.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
this fixes fbo-generatemipmap-formats rgtc and s3tc in NPOT mode
with softpipe.
r600g fails to even get level 0 correct so have to look into that
a bit further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was always converting to 8-bit per channel unsigned formats,
which isn't suitable for RGTC signed formats, this special cases
those two formats and converts to floats for those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SNORM needs a bit of work in the state tracker in order for mipmap
generation to work I believe.
I'm also not sure that having unorm fetches for an snorm format is
sane.
With signed types we weren't hitting this test however the comment
stating this doesn't happen often doesn't apply when using signed
types since an all 0 block is quite common which isn't abs min or max.
this fixes the limits correctly again also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
if the values are all in the last dword, the high bits can be 0,
This fixes a valgrind warning I saw when playing with mipmaps.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drivers can call this function as needed. It tells the VBO module to
always unmap the current glBegin/glEnd VBO when we flush. Otherwise
it's possible to be in a flushed state but still have the VBO mapped.
Among other benefits, parallel makes now work. Since many people have
parallel builds by default (via MAKEFLAGS environment variable), this
sames some irritation at release time...when there's usually not any
other irritation already.
In my last commit I introduced a build dependency upon a new libdrm.
Add the associated autoconf checks. As the headers are part of the core
libdrm, we need to bump that version and so may as well bump the chipset
specific versions simultaneously.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With relaxed relocation checking in the kernel, we can specify a
negative delta (i.e. pointing outside of the target bo) in order to fake
a range in a large buffer. We only then need to upload the elements used
and adjust the buffer offset such that they correspond with the indices
used in the DrawArrays.
(Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and take advantage of start_vertex_bias to trim to [min_index,
max_index] where possible (i.e. when we need to upload all arrays).
Fixes half_float_vertex(misc.fillmode.wireframe)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When doing copy swapbuffers using drm, throttle on outstanding copy operations.
Introduces a new environment variable, EGL_THROTTLE_FENCES that the
user can use to indicate the desired number of outstanding swapbuffers, or
disable throttling using EGL_THROTTLE_FENCES=0.
This can and perhaps should be extended to the pageflip case as well, since
with some hardware pageflips can be pipelined. In case the pageflip syncs, the
throttle operation will be a no-op anyway.
Update copyright notices.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Use the pageflip ioctl when available.
Otherwise, or when the backbuffer contents need to be preserved,
fall back to a copy operation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The copy swap can be used when we need to preserve the contents of
the back buffer or when there is no way to do native page-flipping.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This makes it usable also for native helpers.
Also add inline functions to access the context and to
uninit the native display structure.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is reliant on a drm patch that I posted on the list + a version bump.
These will appear in drm-next today.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the drm minor version is > 9 (i.e. whats in drm-next),
we enable s3tc + texture tiling by default now.
this changes R600_FORCE_TILING to R600_TILING which can
be set to false to disable tiling on working drm.
Signed-off-by: Dave Airlie <airlied@redhat.com>
nv50_resource is being called nv04_resource now temporarily, to avoid
a naming conflict with nouveau_resource from libdrm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Modified from original to remove chipset-specific code, and to be decoupled
from the mm present in said drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
By default the hardware rounds texcoords. However,
for point sampled textures, the expected behavior is
to truncate. When we have point sampled textures,
set the truncate bit in the sampler.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=25871
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
No idea why I didn't do it like this the first time, but share
the code like other portions of mesa do using _tmp.h suffix
and some #defines for the types.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The other draw stages like aaline and pstipple were already doing this.
If the driver used the aapoint stage but not the others it would crash
because of a null pipe->draw pointer.
It was only implemented in the swrast driver and probably not used by
any applications. A modern app would use a dependent/chained texture
lookup in the fragment shader.
when doing glCopyTex[Sub]Image() and checking the source buffer's
completeness.
We only need to determine FBO completeness when the status is indeterminate.
I removed the HiZ memory management, because the HiZ RAM is too small
and I also did it in hope that HiZ will be enabled more often.
This also sets aligned strides to HIZ_PITCH and ZMASK_PITCH.
This adds support for the RGTC unsigned and signed
texture storage and fetch methods.
the code is a port of the DXT5 alpha compression code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With an extremely dumb strategy. But it's the same i915c employs.
Also improve the hw_atom code slightly by statically specifying the
required batch space. For extremely variably stuff (shaders, constants)
it would probably be better to add a new parameter to the hw_atom->validate
function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also contains the first few bits for hw state atoms.
v2: Implement suggestion by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Add the batch bo to the libdrm validation lost, for otherwise
libdrm won't take previously used buffers into account.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move it to i915_state_static.c This way i915_emit_state.c only emits
state and doesn't (re)calculate it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This came out of discussion at the office today, and we agreed that
solving this for indirect wasn't really interesting, though the
server-side change would be of a similar level of difficulty.
If another thread bound a context to the drawable then unbound it, the
driContextPriv would end up NULL.
With the previous two fixes, this fixes glx-multithread-makecurrent-2,
despite the issue not being about the multithreaded makecurrent.
The driver only has one reasonable place to look for its context to
flush anything, which is the current context. Don't bother it with
having to check.
This extension allows a client to bind one context in multiple threads
simultaneously. It is then up to the client to manage synchronization of
access to the GL, just as normal multithreaded GL from multiple contexts
requires synchronization management to shared objects.
The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're
doing a non-bias texture lookup. It has the same value as the new constant
BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no
functional changes.
There is an issue with gcc 4.6.0 that leads to segfault/assert with mesa
due to ureg_src size, reshuffling the structure member to better better
alignment work around the issue.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47893
7.9 + 7.10 candidate
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
so far only hw mipmap generation is testing on softpipe,
passes test added to piglit.
this requires another patch to mesa to let array textures mipmaps
even start to happen.
platform.system in SCons on Cygwin includes the OS version number.
Windows XP - CYGWIN_NT-5.1
Windows Vista - CYGWIN_NT-6.0
Windows 7 - CYGWIN_NT-6.1
Reduce all Cygwin platform variants to just 'cygwin' so anything
downstream can simply use 'cygwin' instead of the different full
platform names.
255.875 matches the hardware documentation. Presumably this was a typo.
NOTE: This is a candidate for the 7.10 branch, along with
commit 2bfc23fb86.
Reviewed-by: Eric Anholt <eric@anholt.net>
In the case where glBlitFramebuffer is being used to copy to a texture
without scaling it is faster if we can use the hardware to do a blit
rather than having to do a texture render. In most of the drivers
glCopyTexSubImage2D will use a blit so this patch makes it check for
when glBlitFramebuffer is doing a simple copy and then divert to
glCopyTexSubImage2D.
This was originally proposed as an extension to the common meta-ops.
However, it was rejected as using the BLT is only advantageous for Intel
hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Might be necessary if a block sneaks in somewhere, like a common
block for moves of phi sources after a loop break.
This is harmless and normally will be removed before emission.
In linear scan we can't allocate multiple values with different
live ranges at the same time to assign them consecutive regs.
Maybe we should just switch to graph coloring for all values ...
The svga_update_state() mechanism is inadequate as it will always end up
flushing the primitives before processing the SVGA_NEW_COMMAND_BUFFER
dirty state flag.
Fixes piglit/fbo-depth-sample-compare:
==14722== Invalid free() / delete / delete[]
==14722== at 0x4C240FD: free (vg_replace_malloc.c:366)
==14722== by 0x84FBBFD: intel_upload_unmap (intel_buffer_objects.c:695)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== Address 0xc606310 is 0 bytes after a block of size 18,720 alloc'd
==14722== at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==14722== by 0x85202AB: copy_array_to_vbo_array (brw_draw_upload.c:256)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34604
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This adds EXT_texture_array support to r600g, it passes the piglit
array-texture test but I suspect may not be complete.
It currently requires a kernel patch to fix the CS checker to allow
these, so you need to use R600_ARRAY_TEXTURE=true for now
to enable them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is because the HW doesn't always store a 1D array like a
2D texture, it more likely stores it like 2D texture (i.e.
alignments etc).
This means we upload each slice separately and let the driver
work out where to put it.
this might break nvc0 as I can't test it, I have only nv50 here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
... and prefers a small batch whereas gen4+ prefer a large batch to
carry more state.
Tuning using openarena/padman indicate that a batch size of just 4096 is
best for those cases.
Bugzilla: https://bugs.freedesktop.org/process_bug.cgi
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that the offsets in the vertex-fetch instructions need to be byte-aligned and makes no specification with regard to the required alignment of the offset and stride in the vertex resource constant register.
However, testing indicates that all three values need to be DWORD aligned.
Ptr can be very well NULL, so when there are two arrays, with one having
offset 0 (and thus NULL Ptr), and the other having a non-zero offset,
the non-zero value is taken as minimum (because of !low_addr ? start ...).
On 32-bit systems, this somehow works. On 64-bit systems, it leads to crashes.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
255.875 matches the hardware documentation. Presumably this was a typo.
Found by inspection. Not known to fix any issues.
Reviewed-by: Eric Anholt <eric@anholt.net>
pixel_w is the final result; wpos_w is used on gen4 to compute it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.
Fixes several OpenGL ES2 conformance failures on Sandybridge.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_PointSize (VERT_RESULT_PSIZ) doesn't take up a message register,
as it's part of the header. Without this fix, writing to gl_PointSize
would cause the SF to read and use the wrong attributes, leading to all
kinds of random looking failure.
Reviewed-by: Eric Anholt <eric@anholt.net>
If two buffers had the same stride where one buffer is a user one and
the other is a vbo, it was considered to be one interleaved buffer,
resulting in incorrect rendering and crashes.
This patch makes sure that the interleaved buffer is either user or vbo,
not both.
Needs to track this ourself since because we get into a race condition with
the dri_util.c code on make current when rendering to the front buffer.
This is what happens:
Old context is rendering to the front buffer.
App calls MakeCurrent with a new context. dri_util.c sets
drawable->driContextPriv to the new context and then calls the driver make
current. st/dri make current flushes the old context, which calls back into
st/dri via the flush frontbuffer hook. st/dri calls dri loader flush
frontbuffer, which calls invalidate buffer on the drawable into st/dri.
This is where things gets wrong. st/dri grabs the context from the dri
drawable (which now points to the new context) and calls invalidate
framebuffer to the new context which has not yet set the new drawable as its
framebuffers since we have not called make current yet, it asserts.
When clearing a GL_LUMINANCE_ALPHA buffer, for example, we need to convert
the clear color (R,G,B,A) to (R,R,R,A). We were doing this for texture border
colors but not renderbuffers. Move the translation function to st_format.c
and share it.
This fixes the piglit fbo-clear-formats test.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If finalizing a non-POW mipmapped texture with an odd-sized base texture
image we were allocating the wrong size of gallium texture (off by one).
Need to be more careful about computing the base texture image size.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=34463
Reducing the number of relocations has lots of nice knock-on effects,
not least including reducing batch buffer size, auxilliary array sizes
(vmalloced and copied into the kernel), processing of uncached
relocations etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using a temporary buffer for large discontiguous uploads into the common
buffer and a single buffered upload is faster than performing the
discontiguous copies through a mapping into the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Upload the non-vbo arrays into a single interleaved buffer object, and
so need to just emit a single vertex buffer relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the user passed in several arrays interleaved in the same vbo, only
emit a single vertex buffer and relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for a greater change, use the color_calc_state_bo already
provisioned for this purpose.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing a blit to completely overwrite a busy bo, simply
discard it and create a new one with the fresh data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic draw buffers are used by clients for temporary arrays and for
uploading normal vertex arrays. By keeping the data in memory, we can
avoid reusing active buffer objects and reallocate them as they are
changed. This is important for Sandybridge which can not issue blits
within a batch and so ends up flushing the batch upon every update, that
is each batch only contains a single draw operation (if using dynamic
arrays or regular arrays from system memory).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fix a typo which meant that --enable-shared-glapi didn't actually cause a shared glapi to be built
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This behavior was last when moving the transfers to the contexts.
This fixes several piglit failures, which were reading the color renderbuffer
before the draw operations were emitted.
This is a temporary workaround. It fixes sauerbrauten with shaders enabled.
I guess we might be changing vertex attribs somewhere and not updating
the appropriate dirty flags, therefore we can't rely on them for now.
Or maybe we need to make this state dependent on some other flags too.
More info:
https://bugs.freedesktop.org/show_bug.cgi?id=34378
When we drop the in_swtnl_draw flag, we must force a rerun of
update_need_swtnl to reset the need_swtnl flag to its correct value outside
of a swtnl vbo draw.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
If we hit the pipe_get/put_tile() path for setting up the glCopyPixels
texture we were passing the wrong x/y position to pipe_get_tile().
The x/y position was already accounted for in the pipe_get_transfer()
call so we were effectively reading from 2*readX, 2*readY.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Addresses excessive TEMP allocation in vertex shaders where all CONSTs are
stored into TEMPS at the start, but copy propagation was failing due to
the presence of IFs.
We could do something about loops, but ifs are easy enough.
This enables the egl_dri2 driver to load swrast driver
for software rendering. It could be used when hardware
dri2 drivers are not available, such as in VM.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Some cards don't appear to work correctly with the UNORM type,
so switch to the integer type, however since gallium has no
integer types yet from what I can see we need to do a hack to
workaround it for the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a workaround for a bug in libtxc_dxtn.
Fixes:
- piglit/GL_EXT_texture_compression_s3tc/fbo-generatemipmap-formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
Arrays are zero based. If the highest element accessed is 6, the
array needs to have 7 elements.
Fixes piglit test glsl-fs-implicit-array-size-03 and bugzilla #34198.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Fixes regression: https://bugs.freedesktop.org/show_bug.cgi?id=34160
Commit e7c1f058d1 disabled constant-folding
when division-by-zero occured. This was a mistake, because the spec does
allow division by zero. (From section 5.9 of the GLSL 1.20 spec: Dividing
by zero does not cause an exception but does result in an unspecified
value.)
For floating-point division, the original pre-e7c1f05 behavior is
reinstated.
For integer division, constant-fold 1/0 to 0.
This reverts commit b3cf92aa91.
The reverted commit prevented constant-folding of reciprocal expressions
when the reciprocated expression was 0. However, since the spec allows
division by zero, constant-folding *is* permissible in this case.
From Section 5.9 of the GLSL 1.20 spec:
Dividing by zero does not cause an exception but does result in an
unspecified value.
Before populating the vertex buffer attribute pointer (VB->AttribPtr[]),
convert vertex data in GL_FIXED format to GL_FLOAT.
Fixes bug: http://bugs.freedesktop.org/show_bug.cgi?id=34047
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
On r600, s3tc formats require a 1D tiled texture format,
so we have to do uploads using a blit, via the 64-bit and 128-bit formats
Based on the r600c code we use a 64 and 128-bit type to do the
blits.
Still requires R600_ENABLE_S3TC until the kernel fixes are in,
this has only been tested on evergreen where the kernel doesn't
yet get in the way.
the miptree setup and pitch storing didn't work so well for block
based things like compressed textures. The CB takes blocks, where
the texture sampler takes pixels, and transfers need bytes,
So now we store blocks/bytes and translate to pixels in the sampler.
This is necessary for s3tc to work properly.
If the underlying transfer had a stride wider for hw alignment reasons,
the mipmap generation would generate badly strided images.
this fixes a few problems I found while testing r600g with s3tc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
If EXT_draw_buffers2 is supported but ARB_draw_buffers_blend isn't
_mesa_BlendFuncSeparateEXT only sets up the blend equation and function for the
first render target. This patch makes sure that update_blend doesn't try to use
the data from other rendertargets in such cases.
Signed-off-by: Brian Paul <brianp@vmware.com>
The vertex arrays state should be set only when (_NEW_ARRAY | _NEW_PROGRAM)
is dirty. This assumes user buffer content is mutable, which will be
sorted out in the next commit. The following usage case should be much faster
now:
for (i = 0; i < 1000; i++) {
glDrawElements(...);
}
Or even:
for (i = 0; i < 1000; i++) {
glSomeStateChangeOtherThanArraysOrProgram(...);
glDrawElements(...);
}
The performance increase from this may be significant in some apps and
negligible in others. It is especially noticable in the Torcs game (r300g):
Before: 15.4 fps
After: 20 fps
Also less looping over attribs in st_draw_vbo yields slight speed-up
in apps with lots of glDraw* calls.
but really wtf? all these PCI IDs need to be ripped out of here, its totally
unscalable and the drivers already have this info so could export it some better way.
tested by Darxus on #wayland.
This an adds --enable-shared-dricore option to configure. When enabled,
DRI modules will link against a shared copy of the common mesa routines
rather than statically linking these.
This saves about 30MB on disc with a full complement of classic DRI
drivers.
v2: Only enable with a gcc-compatible compiler that handles rpath
Handle DRI_CFLAGS without filter-out magic
Build shared libraries with the full mklib voodoo
Fix typos
v3: Resolve conflicts with talloc removal patches
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Track variables, functions, and types during parsing. Use this
information in the lexer to return the currect "type" for identifiers.
Change the handling of structure constructors. They will now show up
in the AST as constructors (instead of plain function calls).
Fixes piglit tests constructor-18.vert, constructor-19.vert, and
constructor-20.vert. Also fixes bugzilla #29926.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This requires lexical disambiguation between variable and type
identifiers (as most C compilers do).
Signed-off-by: Keith Packard <keithp@keithp.com>
NOTE: This is a candidate for the 7.9 and 7.10 branches.
add support for the 32-bit types, also fixup the
export setting to handle types with channels > 11 bits properly
Signed-off-by: Dave Airlie <airlied@redhat.com>
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
Previously the SNE and SEQ instructions would calculate the partial
result to the destination register. This would cause problems if the
destination register was also one of the source registers.
Fixes piglit tests glsl-fs-any, glsl-fs-struct-equal,
glsl-fs-struct-notequal, glsl-fs-vec4-operator-equal,
glsl-fs-vec4-operator-notequal.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If the formats don't match we need to update the surface with the new
format.
if we can render to SRGB formats, enable the extension
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a regression from commit 5cbff0932e.
The problem is *some* glDrawPixels fragment programs need to be deleted,
but not all. Use an explicit flag to indicate whether or not the program
needs to be deleted.
This should fix http://bugs.freedesktop.org/show_bug.cgi?id=34049
From section 5.9 of the GLSL 1.20 spec:
The operator modulus (%) is reserved for future use.
From section 5.8 of the GLSL 1.20 spec:
The assignments modulus into (%=), left shift by (<<=), right shift by
(>>=), inclusive or into ( |=), and exclusive or into ( ^=). These
operators are reserved for future use.
The GLSL ES 1.00 spec and GLSL 1.10 spec have similiar language.
Fixes bug:
https://bugs.freedesktop.org//show_bug.cgi?id=33916
Fixes Piglit tests:
spec/glsl-1.00/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.00/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.10/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.10/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.20/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.20/compiler/assignment-operators/modulus-assign-00.frag
Fixes a minor memory leak with the "engine" mesa demo.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure we unreference the vertex buffer pointers in a local array.
This fixes huge vertex buffer / memory leaks in mesa demos "fire" and "engine".
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Relative addressing of constant buffers can't work properly through the
kcache, since you can only address within the currently locked kcache window.
Instead, this patch binds the constant buffer as a shader resource, and then
explicitly fetches the constant using a vertex fetch with fetch type
VTX_FETCH_NO_INDEX_OFFSET from the shader. There's probably still some room
for improvement, doing the fetch right before the instruction that needs the
value may not be quite optimal for example.
The r600_bc_alu_src structure is used in two different ways, as a vector and
for the individual channels of that same vector. This is somewhat fragile,
and probably confusing.
If the client hasn't done the initial wl_display_iterate() at the time
we initialize the display, we have to do that in platform_wayland.c.
Make sure we detect that correctly instead of dup()ing fd=0, and use
the sync callback to make sure we don't wait forever for authorization that
won't happen.
This code has originally matured in r300g and was ported to r600g several
times. It was obvious it's a code duplication.
See also comments in the header file.
The scheduler and the register allocator are not good enough yet to deal
with the effects of the register rename pass. This was causing a 50%
performance drop in Lightsmark. The pass can be re-enabled once the
scheduler and the register allocator are more mature. r300 and r400
still need this pass, because it prevents a lot of shaders from using
too many texture indirections.
NOTE: This is a candidate for the 7.10 branch.
This adds i965 support for GL_EXT_framebuffer_sRGB, it introduces a new
constant to say that the driver can support sRGB enabled FBOs since enabling
the extension doesn't mean the driver can actually support sRGB.
Also adds the suggested state flush in the core code suggested by Brian.
fix the ARB_fbo color encoding.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Querying index zero is not an error in OpenGL ES 2.0.
Querying an index larger than the value returned by
GL_MAX_VERTEX_ATTRIBS is an error in all APIs.
Fixes bugzilla #32375.
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes. It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway. However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
The type of u_get_transfer_vtbl of the usage argument in u_transfer.h is
unsigned and not enum pipe_transfer_usage. This patch changes the type
of usage to unsigned to match the prototype in the header file.
Standard library functions in C++ are in the std namespace. When using
C++-style header files for the standard library, some compilers, such as
Sun Studio, provide symbols only for the std namespace and not for the
global namespace.
This patch adds using statements for standard library functions. Another
option could have been to prepend standard library function calls with
'std::'.
This patch fixes several compilation errors with Sun Studio.
This just adds a flag to create the texture without doing any
flushing to it. Flushing occurs in the draw function. This avoids
unnecessary flushes when we end up rebinding a CB/DB/texture due
to the blitter just restoring state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, the set_sampler_views() and restore_sampler_views() functions
used MAX2(old,new) to tell the driver how many samplers or sampler
views to set. This could result in cases such as:
pipe->set_fragment_sampler_views(pipe, 4, views={foo, bar, NULL, NULL})
Many/most gallium drivers would take this as-is and set
ctx->num_sampler_views=4 and ctx->sampler_views={foo, bar, NULL, NULL, ...}.
Later, loops over ctx->num_sampler_views would have to check for null
pointers. Worse, the number of sampler views and number of sampler CSOs
could get out of sync:
ctx->num_samplers = 2
ctx->samplers = {foo, bar, ...}
ctx->num_sampler_views = 4
ctx->sampler_views={Foo, Bar, NULL, NULL, ...}
So loops over the num_samplers could run into null sampler_views pointers
or vice versa.
This fixes a failed assertion in the SVGA driver when running the Mesa
engine demo in AA line mode (and possibly other cases).
It looks like all gallium drivers are careful to unreference views
and null-out sampler CSO pointers for the units beyond what's set
with the pipe::bind_x_sampler_states() and pipe::set_x_sampler_views()
functions.
I'll update the gallium docs to explain this as well.
This new interface could set up context for OpenGL,
OpenGL ES1 and OpenGL ES2. It will be used by egl_dri2
driver.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Leak was introduced when fixing strict aliasing violation in this code:
the reference counting was preserved, but the destructor call on zero
reference count was not.
Exactly one half would be the ideal, but this is a soft limit, and one
more byte over brings us to synchronous behavior.
Flushing when the referred GMR exceeds one third of the aperture gives us
statistically better performance.
Certain mingw32 cross compilers (e.g. RedHat's) defaults to use DLL gcc
runtime.
Given the main deliverable from this project are self-contained drivers,
which are loaded by any application, this dependency can cause havoc.
If we get a sw accessible buffer like the S8 texture we end up
doing depth tracking on it when there is no need since we won't
ever bind it to the hardware. This leads to a sw fallback in the
transfer destruction which leads to and endless recusion loop
of fail in transfer destroy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a flag to keep track of whether the depth texture structure
is the flushed texture or not, so we can avoid doing flushes when
we do a hw rendering from one to the other.
it also renames flushed to dirty_db which tracks if the DB copy
has been dirtied by being bound to the hw.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This consolidates the code duplicated between the fragment sampler
and vertex sampler functions. Plus, it'll make adding support for
geometry shader samplers trivial.
Before we were looping to nr_samplers, which is the number of fragment
samplers, not vertex samplers.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
For example, this now raises an error:
#define XXX 1 / 0
Fixes bug: https://bugs.freedesktop.org//show_bug.cgi?id=33507
Fixes Piglit test: spec/glsl-1.10/preprocessor/modulus-by-zero.vert
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Avoid division-by-zero when constant-folding the following expression
types:
ir_unop_rsq
ir_binop_div
ir_binop_mod
Fixes bugs:
https://bugs.freedesktop.org//show_bug.cgi?id=33306https://bugs.freedesktop.org//show_bug.cgi?id=33508
Fixes Piglit tests:
glslparsertest/glsl2/div-by-zero-01.frag
glslparsertest/glsl2/div-by-zero-02.frag
glslparsertest/glsl2/div-by-zero-03.frag
glslparsertest/glsl2/modulus-zero-01.frag
glslparsertest/glsl2/modulus-zero-02.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Do not constant-fold a reciprocal if any component of the reciprocated
expression is 0. For example, do not constant-fold `1 / vec4(0, 1, 2, 3)`.
Incorrect, previous behavior
----------------------------
Reciprocals were constant-folded even when some component of the
reciprocated expression was 0. The incorrectly applied arithmetic was:
1 / 0 := 0
For example,
1 / vec4(0, 1, 2, 3) = vec4(0, 1, 1/2, 1/3)
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This was my mistake when converting from talloc to ralloc. I was
confused because the other calls in the function are to asprintf_append
and the original code used str as the context rather than NULL.
Fixes bug #33823.
Previously a register would be marked as available if any component
was written. This caused shaders such as this:
0: TEX TEMP[0].xyz, INPUT[14].xyyy, texture[0], 2D;
1: MUL TEMP[1], UNIFORM[0], TEMP[0].xxxx;
2: MAD TEMP[2], UNIFORM[1], TEMP[0].yyyy, TEMP[1];
3: MAD TEMP[1], UNIFORM[2], TEMP[0].zzzz, TEMP[2];
4: ADD TEMP[0].xyz, TEMP[1].xyzx, UNIFORM[3].xyzx;
5: TEX TEMP[1].w, INPUT[14].xyyy, texture[0], 2D;
6: MOV TEMP[0].w, TEMP[1].wwww;
7: MOV OUTPUT[2], TEMP[0];
8: END
to produce incorrect code such as this:
BEGIN
DCL S[0]
DCL T_TEX0
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[0].xyz = MOV U[0]
R[1] = MUL CONST[0], R[0].xxxx
R[2] = MAD CONST[1], R[0].yyyy, R[1]
R[1] = MAD CONST[2], R[0].zzzz, R[2]
R[0].xyz = ADD R[1].xyzx, CONST[3].xyzx
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[1].w = MOV U[0]
R[0].w = MOV R[1].wwww
oC = MOV R[0]
END
Note that T_TEX0 is copied to R[0], but the xyz components of R[0] are
still expected to hold a calculated value.
Fixes piglit tests draw-elements-vs-inputs, fp-kill, and
glsl-fs-color-matrix. It also fixes Meego bugzilla #13005.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Also return it as the correct type. Previously the whole array would
be returned and each element would be expanded to a vec4.
Fixes piglit test getuniform-01 and bugzilla #29823.
Passing ralloc_vasprintf_append a 0-byte allocation doesn't work. If
passed a non-NULL argument, ralloc calls strlen to find the end of the
string. Since there's no terminating '\0', it runs off the end.
Fixes a crash introduced in 14880a510a.
If we see a MACRO bit on r600g its 2D tiled,
if don't see a MACRO bit and we do see a MICRO bit then its 1D tiled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the tile type for the buffer is 1 then its been bound to the
DB at some point, we need to decompress it, otherwise its only
been bound as texture/cb so don't do anything.
This fixes 5 piglit tests here on r600g.
this just adds the ioctl interface and sets the tile type
and array mode in the correct place.
This seems to bring eg 1D tiling to the same level, and issues
as on r600. No idea how to address 2D yet.
Previously we'd happily compile GLSL 1.30 shaders on any driver. We'd
also happily compile GLSL 1.10 and 1.20 shaders in an ES2 context.
This has been a long standing FINISHME in the compiler.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
This annotation is for an "in" function parameter for which it is only legal
to pass constant expressions. The only known example of this, currently,
is the textureOffset functions.
This should never be used for globals.
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
Since the introduction of ir_var_system_value, system variables would be
printed as "temporary" and temporaries would result in out-of-bounds
array access, showing up as garbage in printed IR.
SVGA3D only supports SGN for vertex shaders, and this requires two additional
temporary registers for intermediate results.
For fragment shaders, lower to two CMPs and one ADD.
The extra length is the size of the request *minus* the size of the
VendorPrivate header, not the addition.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
xGLXChangeDrawableAttributesSGIXReq follows the GLXVendorPrivate header
with a drawable, number of attributes, and list of (type, value)
attribute pairs. Don't forget to put the number of attributes in there.
I don't think this can ever have worked.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Like on some r5xx, there are multiple DB backends on the r600,
we need to add up the query results from each of these to get the
final correct value.
So far I'm not 100% sure how to calculate the num_db, value
setting it to 4 should be harmless enough until we do.
This fixes occulsion_query piglit test on my rv740.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although CUBE is a reduction inst, it writes to more than just PV.X
so we need to keep the dst channel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This only works on r600/r700 so far, evergreen doesn't appear
to have the multiwrite enable bit in the color control, so we
may have to actually do a shader rewrite on EG hardware.
remove some duplicate code reg defines also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I know Jerome will probably rewrite the way depth textures work sometime
soon. For the time being this should at least make common depth texture usage
for shadowing work properly though.
When the paint is opaque (currently, solid color with alpha 1.0f), no
blending is needed for VG_BLEND_SRC_OVER. This eliminates the serious
performance hit introduced by 859106f196
for a common scenario.
Swizzles are now defined everywhere as a field with 12 bits that contains
4 channels worth of meaningful information. Any channel that is unused is
set to RC_SWIZZLE_UNUSED. This change is necessary because rgb instructions
and alpha instructions were initializing channels that would never be used
(channel 3 for rgb and channels 1-3 for alpha) with 0 (aka RC_SWIZZLE_X).
This made it impossible to use generic helper functions for swizzles,
because sometimes a channel value of 0 meant unused and other times it
meant RC_SWIZZLE_X.
All hacks that tried to guess how many channels were relevant have
also been removed.
1) Only translate the [min_index, max_index] range.
2) Upload translated vertices via the uploader.
3) Rename valid_vertex_buffer[] to real_vertex_buffer[]
Only upload the [min_index, max_index] range instead of [0, userbuf_size].
This an important optimization.
Framerate in Lightsmark:
Before: 22 fps
After: 75 fps
The same optimization is already in r300g.
I can't see a performance difference with this code, which means all
the driver-specific code removed in this commit was unnecessary.
Now we use u_upload_mgr in a slightly different way than we did before it got
dropped. I am not restoring the original code "as is" due to latest
u_upload_mgr changes that r300g performance benefits from.
This also fixes:
- piglit/fp-kil
When an app loads libEGL.so dynamically with RTLD_LOCAL, loading DRI
drivers would fail because of missing glapi symbols. This commit makes
egl_dri2 load libglapi.so with RTLD_GLOBAL to export glapi symbols for
future symbol resolutions.
The same trick can be found in GLX. However, egl_dri2 can only do so
when --enable-shared-glapi is given. Because, otherwise, both libGL.so
and libglapi.so define glapi symbols and egl_dri2 cannot tell which
library to load.
When the user sets EGL_DRIVER to egl_dri2 (or egl_glx), make sure the
built-in driver is used. The user might leave the outdated egl_dri2.so
(or egl_glx.so) on the filesystem and we do not want to load it.
makedepend would crash when a source includes a header indirectly, such
as
#define HEADER "some-header.h"
#include HEADER
Do not define HEADER (makedepend would detects this as an incomplete
include) and add the dependency manually in the Makefile.
This should hopefully fix bug #33374.
For 1D/2D texture arrays use the pipe_resource::array_size field.
In OpenGL 1D arrays texture use the height dimension as the array
size and 2D array textures use the depth dimension as the array size.
Gallium uses a special array_size field instead. When setting up
gallium textures or comparing Mesa textures to gallium textures we
need to be extra careful that we're comparing the right fields.
The new st_gl_texture_dims_to_pipe_dims() function maps OpenGL
texture dimensions to gallium texture dimensions and simplifies
this quite a bit.
This reverts commit d3df641f0a.
The original commit had sat unpushed on my machine for months. By the
time I found it again, I had forgotten that we had decided not to use
this change after all, (the relevant test was removed long ago).
The GLSL specification is vague here, (just says "as is standard for
C++"), though the C specifications seem quite clear that this should
be an error.
However, an existing piglit test (CorrectPreprocess11.frag) expects
this to be a warning, not an error, so we change this, and document in
README the deviation from the specification.
So that 'foo' can be found in: OPTION=prefixfoosuffix,foo
Also allow that debug options can be separated by a non-alphanumeric characters
instead of just commas.
This drops the memblock manager for ZMASK. Instead, only one zbuffer can be
compressed at a time. Note that this does not necessarily have to be slower.
When there is a large number of zbuffers, compression might be used more often
than it was before. It's also easier to debug.
How it works:
1) 'clear' turns the compression on.
2) If some other zbuffer is set or the currently-bound zbuffer is used
for texturing, the driver decompresses it and then turns the compression off.
Notes:
- The ZMASK clear has been refactored, so that only one packet3 is used to clear
ZMASK.
- The 8x8 compression mode is disabled. I couldn't make it work without issues.
- Also removed driver-specific stuff from u_blitter.
Driver status:
- RV530 and R580 appear to just work (finally).
- RV570 should work, but there may be an issue that we don't correctly
calculate the number of dwords to clear, resulting in a partially
uninitialized zbuffer.
- RS690 misrenders as if no ZMASK clear happened. No idea what's going on.
- RV350 may even hardlock. This issue was already present and this patch doesn't
fix it.
I think we are still missing some hardware info we need to make the zbuffer
compression work fully.
Note that there is also an issue with HiZ, resulting in a sort of blocky
zigzagged corruption around some objects.
From the AMD_conservative_depth spec:
If gl_FragDepth is redeclared in any fragment shader in a program, it
must be redeclared in all fragment shaders in that program that have
static assignments to gl_FragDepth. All redeclarations of gl_FragDepth in
all fragment shaders in a single program must have the same set of
qualifiers.
When AMD_conservative_depth is enabled:
* Let 'layout' be a token.
* Extend the production rule of layout_qualifier_id to process the tokens:
depth_any
depth_greater
depth_less
depth_unchanged
Let's assume there are two options with names such that one is a substring
of another. Previously, if we only specified the longer one as a debug option,
the shorter one would be considered specified as well (because of strstr).
This commit fixes it by checking that each option is surrounded by commas.
(a regexp would be nicer, but this is not a performance critical code)
Update the max_array_access of a global as functions that use that
global are pulled into the linked shader.
Fixes piglit test glsl-fs-implicit-array-size-01 and bugzilla #33219.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Previously only global arrays with implicit sizes would be patched.
This causes all arrays that are actually accessed to be sized.
Fixes piglit test glsl-fs-implicit-array-size-02.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
emit_adjusted_wpos() needs separate x,y translation values. If we
invert Y, we don't want to effect X.
Part of the fix for http://bugs.freedesktop.org/show_bug.cgi?id=26795
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The set of internalFormat parameters accepted by glRenderBufferStorage
depends on the EXT vs. ARB version of framebuffer_object. The later
added support for GL_ALPHA, GL_LUMINANCE, etc. formats. Note that
these formats might be legal but might not be supported. That should
be checked with glCheckFramebufferStatus().
This reverts commit 65c41d55a0.
There really are quite a few differences in the set of internal
formats allowed by glTexImage and glRenderbufferStorage.
Per the spec, all OpenVG handles are 32-bit. We can't just cast them
to/from integers on 64-bit systems.
Start fixing that mess by introducing a set of handle/pointer conversion
functions in handle.h. The next step is to implement a handle/pointer
hash table...
We were sending too long requests for GLXChangeDrawableAttributes,
GLXGetDrawableAttributes, GLXDestroyPixmap and GLXDestroyWindow.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
FLT_TO_INT is a vector instruction, despite what the (current) documentation
says. FLT_TO_INT_FLOOR and FLT_TO_INT_RPI aren't explicitly mentioned in the
documentation, but those are vector instructions too.
largely a merge of the previously discussed origin/gallium-resource-sampling
but updated.
the idea is to allow arbitrary binding of resources, the way opencl, new gl
versions and dx10+ require, i.e.
DCL RES[0], 2D, FLOAT
LOAD DST[0], SRC[0], RES[0]
SAMPLE DST[0], SRC[0], RES[0], SAMP[0]
Contrary what the name may suggest, LLVM's opaque types are used for
recursive types -- types whose definition refers itself -- so opaque
types correspond to pre-declaring a structure in C. E.g.:
struct node;
struct link {
....
struct node *next;
};
struct node {
struct link link;
}
Void pointers are also disallowed by LLVM. So the suggested way of creating
what's commonly referred as "opaque pointers" is using byte pointer (i.e.,
uint8_t * ).
Fixes the build when selecting driver=osmesa and building static libraries.
Otherwise, mklib tries to add the ‘-ltalloc’ object to the archive, which
obviously fails.
Clients which statically link to osmesa will need to link to libtalloc also,
as specified in the Libs.private of osmesa.pc.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=33360
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
I couldn't make it work.
GB_TILE_CONFIG.Z_EXTENDED, which enables per-pixel Z clamping, and
VAP_CLIP_CNTL.CLIP_DISABLE, which disables clipping, do help, but they
also add regressions like random graphics corruptions in some games.
But we have to cheat and peek at the GENERIC semantic indices the
state tracker uses for TEXn.
Only outputs from 0x300 to 0x37c can be replaced, and so we have to
know on shader compilation which ones to put there in order to keep
doing separate shader objects properly.
At some point I'll probably create a patch that makes gallium not
force us to discard the information about what is a TexCoord.
The _NEW_ACCUM flag was only set when changing the accumulation
buffer clear color and never used anywhere. Reclaim that dirty bit.
Clean up the definitions of the other dirty bit flags.
Only a few texture object parameters can effect texture completeness:
min level, max level, minification filter. Don't mark the texture
incomplete for other texture object state changes.
We are not required to do the linear->sRGB conversion if ARB_framebuffer_sRGB
is unsupported. However I think the conversion should work in hw except
for blending, which matches the D3D9 behavior.
The rvalue of the returned value can be NULL if the shader says
'return foo();' and foo() is a function that returns void.
Existing GLSL specs do *NOT* say that this is an error. The type of
the return value is void. If the return type of the function is also
void, then this should compile without error. I expect that future
versions of the GLSL spec will fix this (wink, wink, nudge, nudge).
Fixes piglit test glsl-1.10/compiler/expressions/return-01.vert and
bugzilla #33308.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Before, we were converting between linear/sRGB in glReadPixels,
glDrawPixels, glAccum, etc if the renderbuffer was an sRGB texture.
Those all need to operate on pixel values as-is without conversion.
Also, when setting up render-to-texture, if the texture is sRGB the
pipe_surface view must be linear RGB. This will change when we
support GL_ARB_framebuffer_sRGB.
This fixes http://bugs.freedesktop.org/show_bug.cgi?id=33353
When we read/write image tiles we need to use the format specified
in the pipe_surface, not the pipe_transfer format (which comes from
the underlying texture/resource format).
This comes up when rendering to sRGB surfaces (via OpenGL render to
texture). Ignoring the new GL_ARB/EXT_framebuffer_sRGB extension
for now, when we render to a sRGB surface we need to treat it like
a regular, linear colorspace RGB surface. Before, when we read/wrote
tiles to sRGB surfaces we were inadvertantly doing the color space
conversion.
The new function, pipe_get_tile_rgba_format(), no longer takes a
swizzle (we weren't actually using it anywhere). Rename it to
indicate that the format is passed explicitly.
GLES can be enabled by running scons with
$ scons gles=yes
When gles=yes is given, the build is changed in three ways. First,
libmesa.a will be built with FEATURE_ES1 and FEATURE_ES2. This makes
DRI drivers and libEGL support and advertise GLES support. Second, GLES
libraries will be created. They are libGLESv1_CM, libGLESv2, and
libglapi. Last, libGL or opengl32 will link to libglapi. This change
is required as _glapi_* will be declared as __declspec(dllimport) in
libmesa.a on windows. libmesa.a expects those symbols to be defined in
another DLL. Due to this change to GL, GLES support is marked
experimental.
Note that GLES requires libxml2-python to generate some of its sources.
The original allocations use regs->regs as the context, so talloc will
happily ignore the context given here. Change it to match to clarify
that it isn't changing.
Improves the cases when:
* an explicit assignment references the read-only variable
* an 'out' or 'inout' function parameter references the read-only variable
This adds the get/enable enums and internal gl_config storage
for this extension.
In theory this is all that is needed to enable this extension
from what I can see, since its not mandatory to implement the
features if you don't advertise the visuals or the fb configs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The loop to choose a pixel format for the window was incrementing
'i' after we succeeded in creating the window so if we chose format[0]
for graw_create_window_and_screen() we were putting format[1] in
the pipe_resource template for creating the render target.
This only worked because of the order of the elements in the formats[]
array.
The graw_xlib.c code now properly compares the requested gallium pixel
format against the visual's color layout.
Update all the graw demos to fix the off-by-one-i error.
ideally this should be set once in the beginning of CS but there's
no way to change values there while in the middle of rendering.
For now reemitting SQ setup seems to work probably due to
r700WaitForIdleClean after each render
currently does not to try to decrease values once increased
fixes hangs in glsl-vs-vec4-indexing-temp-src-in-nested-loop-combined
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined for my rv740
maybe more for other chips
When --enable-shared-glapi is specified, libGL will share libglapi with
OpenGL ES instead of defining its own copy of glapi. This makes sure an
app will get only one copy of glapi in its address space.
The new option is disabled by default. When enabled, libGL and libglapi
must be built from the same source tree and distributed together. This
requirement comes from the fact that the dispatch offsets used by these
libraries are re-assigned whenever GLAPI XMLs are changed.
For GLX, indirect rendering for has_different_protocol() functions is
tricky. A has_different_protocol() function is assigned only one
dispatch offset, yet each entry point needs a different protocol opcode.
It cannot be supported by the shared glapi. The fix to this is to make
glXGetProcAddress handle such functions specially before calling
_glapi_get_proc_address.
Note that these files are automatically generated/re-generated
src/glx/indirect.c
src/glx/indirect.h
src/mapi/glapi/glapi_mapi_tmp.h
Move _glapi_* symbols from libGLESv1_CM.so and libGLESv2.so to
libglapi.so. This makes sure an app will get only one copy of glapi in
its address space.
Note that with this change, libGLES* and libglapi must be built from the
same source tree and distributed together. This requirement comes from
the fact that the dispatch offsets used by these libraries are
re-assigned whenever GLAPI XMLs are changed.
In bridge mode, mapi no longer implements glapi.h. It becomes a user of
glapi.h. Imagine an app that uses both libGL.so and libGLESv2.so.
There will be two copies of glapi in the app's memory. It is possible
that _glapi_get_dispatch does not return what _glapi_set_dispatch set,
if they access different copies of the global variables. The solution
to this situation to build either one of the libraries as a bridge to
the other. Or build both libraries as bridges to another shared
glapi library.
When MAPI_MODE_GLAPI is defined, u_current_table is renamed to
_glapi_Dispatch or _glapi_tls_Dispatch. The ASM dispatchers should not
use hardcoded name.
The new implementation is based on mapi. No new script is needed. As
noted in sources.mk, the way to use it is to compile MAPI_GLAPI_SOURCES
with MAPI_MODE_GLAPI defined.
Fix GLAPIPrinter, ES1APIPrinter, and ES2APIPrinter to output files that
are ready for compilation. Since gl_and_es_API.xml is based on
gl_API.xml, the hidden and handcode attributes of entries have to be
overridden for ES1APIPrinter and ES2APIPrinter.
gl_and_es_API.xml defines OpenGL ES 1.1 and 2.0 API as well as OpenGL
API. It consists of gl_API.xml and the newly added es_EXT.xml,
ARB_get_program_binary.xml, OES_single_precision.xml, and
OES_fixed_point.xml.
This fixes a bunch of unnecessary barriers due to the scheduler not
knowing what that arbitrary register description refers to when trying
to reason about its dependencies.
The result is rescheduling in the convolution kernel shader in
Lightsmark, which results in avoiding register spilling and increasing
the performance of the first scene from 6-7 fps midway through the
panning to 11fps. The register spilling was a regression from Mesa
7.9 to Mesa 7.10.
Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also
reschedules the giant multiply tree at the end of
glsl-fs-convolution-1 so that we end up not spilling registers,
producing the expected level of performance.
Can happen during swrast fallbacks if a buffer is somehow bound as
a render target and a texture.
Fixes gnome-shell on nv20, and gets it mostly working on nv10.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
sw clears were being used and not getting the correct offsets in the span
code.
also not emitting correct offsets for CB draws to texture levels.
(I've no idea why I'm playing with r100).
This is a candidate for 7.9 and 7.10
According to R700 ISA we have only two channels for cfile constants.
This patch makes piglit tests "glsl1-constant array with constant
indexing" happy on RV710.
Fixes the following Piglit tests:
glslparsertest/shaders/array2.frag
glslparsertest/shaders/dataType6.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a potential failure when a begin/end_query is the first
thing to happen after flushing the scene.
NOTE: This is a candidate for the 7.10 and 7.9 branches.
This revealed a bug in ra_get_spill_benefit where we only considered
the benefit of the first adjacency we were to remove, explaining some
of the ugly spilling I've seen in shaders. Because of the reduced
spilling, it reduces the runtime of glsl-fs-convolution-1 36.9% +/-
0.9% (n=5).
This was recommended in the original paper, but I figued "make it run"
before "make it fast". Now we make it fast. Reduces the runtime of
glsl-fs-convolution-1 by 12.7% +/- 0.6% (n=5).
Our use of the register allocator in i965 is somewhat unusual.
Whereas most architectures would have a smaller set of registers with
fewer register classes and reuse that across compilation, we have 1,
2, and 4-register classes (usually) and a variable number up to 128
registers per compile depending on how many setup parameters and push
constants are present. As a result, when compiling large numbers of
programs (as with glean texCombine going through ff_fragment_shader),
we spent much of our CPU time in computing the q[] array. By keeping
a separate list of what the conflicts are for a particular reg, we
reduce glean texCombine time 17.0% +/- 2.3% (n=5).
We don't expect this optimization to be useful for 915, which will
have a constant register set, but it would be useful if we were switch
to this register allocator for Mesa IR.
Fixes texrect-many regression with ff_fragment_shader -- as we added
refs to the subsequent texcoord scaling paramters, the array got
realloced to a new address while our params[] still pointed at the old
location.
The driver was saying that independend blend functions was not supported,
but it really was. The driver was using the per-target independend blend
factors but the state tracker was only setting the 0th one (per the
Gallium spec).
Fixes a piglit fbo-drawbuffers2-blend regression.
See https://bugs.freedesktop.org/show_bug.cgi?id=33215
The removed semantic check also exists in ast_type_specifier::hir(), which
is a more natural location for it.
The check verified that precision statements are applied only to types
float and int.
* Check that precision qualifiers only appear in language versions 1.00,
1.30, and later.
* Check that precision qualifiers do not apply to bools and structs.
Fixes the following Piglit tests:
* spec/glsl-1.30/precision-qualifiers/precision-bool-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-02.frag
Change default value to ast_precision_none, which denotes the absence of
a precision of a qualifier.
Previously, the default value was ast_precision_high. This made it
impossible to detect if a precision qualifier was present or not.
The check is performed only in GLSL versions >= 1.30.
From section 4.3.4 of the GLSL 1.30 spec:
"It is an error to use centroid in in a vertex shader."
Fixes Piglit test
spec/glsl-1.30/compiler/storage-qualifiers/vs-centroid-in-01.vert
The check is performed only in GLSL versions >= 1.30.
Fixes the following Piglit tests:
* spec/glsl-1.30/compiler/interpolation-qualifiers/fs-smooth-02.frag
* spec/glsl-1.30/compiler/interpolation-qualifiers/vs-smooth-01.vert
... and 'centroid varying'. The check is performed only in GLSL
versions >= 1.30.
From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
"interpolation qualifiers may only precede the qualifiers in, centroid
in, out, or centroid out in a declaration. They do not apply to the
deprecated storage qualifiers varying or centroid varying."
Fixes Piglit test
spec/glsl-1.30/compiler/interpolation-qualifiers/smooth-varying-01.frag.
If an interpolation qualifier is present, then the method returns that
qualifier's string representation. For example, if the noperspective bit
is set, then it returns "noperspective".
This implements the extension by choosing a different set of texture
fetch functions when the texture parameter changes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a literal slot isn't used it should be set
to 0 instead of an uninitialized value. Also the
channels for pre R700 trig functions were incorrect.
And most important literals were not counted against ndw,
resulting in an invalid force_add_cf detection.
We don't have all of the features of this extension hooked up yet, but
the consensus yesterday was that since those features are things that
we should also be supporting in our ES2 implementation, claiming ES2
here too doesn't make anything worse and will make incremental
improvement through piglit easier.
This code should never have been triggered, but I often did anyway
when I disabled optimization passes during debugging, then spent my
time debugging that this code doesn't work.
The comment of "this is just like teximages except for..." is a pretty
good clue that we're handling this wrong. By just using the teximage
code, we catch a bunch of cases we'd missed, like GL_RED and GL_RG.
This catches more opportunities than the prog_optimize.c code on
openarena's fixed function shaders turned to GLSL, mostly due to
looking at multiple source instructions for copy propagation
opportunities. It should also be much more CPU efficient than
prog_optimize.c's code.
The usage of macro V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_FLT_TO_INT_FLOOR was
introduced by commit 323ef3a1f0 but the
macro is undefined. Disable this case to fix the build for now.
Add a bit in struct gl_extensions for OES_standard_derivatives, and enable
the bit by default. Advertise the extension only if the bit is enabled.
Previously, OES_standard_derivatives was advertised in GLES2 contexts
if ARB_framebuffer_object was enabled.
The specs that add 'layout' require the use of 'in' or 'out'.
However, a number of implementations, including Mesa, shipped several
of these extensions allowing the use of 'varying' and 'attribute'.
For these extensions only a warning is emitted.
This differs from the behavior of Mesa 7.10. Mesa 7.10 would only
accept 'attribute' with 'layout(location)'. This behavior was clearly
wrong. Rather than carrying the broken behavior forward, we're just
doing the correct thing.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
All of the extensions that add the 'layout' keyword also enable (and
required) the use of 'in' and 'out' with shader globals.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
When use_spoken is true, istart (the first vertex of this segment) is
replaced by i0 (the spoken vertex of the fan). There are still icount
vertices.
Thanks to Brian Paul for spotting this.
Without this, X doesn't start with UMS on r300g.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Paulo Zanoni <pzanoni@mandriva.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The idea is to be able to match a driver using the following order
try egl_gallium with hw renderer
try egl_dri2
try egl_gallium with sw renderer
try egl_glx
given the module list
egl_gallium
egl_dri2
egl_glx
For that, UseFallback initialization option is added. The module list
is matched twice: with the option unset and with the option set. In the
first pass, egl_gallium skips its sw renderer and egl_glx rejects to
initialize since UseFallback is not set. In the second pass,
egl_gallium skips its hw renderer and egl_dri2 rejects to initialize
since UseFallback is set. The process stops at the first driver that
initializes the display.
Reorder/rename and document the fields that should be set by the driver during
initialization. Drop the major/minor arguments from drv->API.Initialize.
This makes it unnecessary to pass _mesa_glsl_parse_state around
everywhere, making at least the prototypes a lot easier to read.
It's also more C++-ish than a pile of static C functions.
All of these functions used to take s_list pointers so they wouldn't all
need SX_AS_LIST conversions and error checking. However, the new
pattern matcher conveniently does this for us in one centralized place.
So there's no need to insist on s_list. Switching to s_expression saves
a bit of code and is somewhat cleaner.
Previously, the IR reader was riddled with code that:
1. Checked for the right number of list elements (via a linked list walk)
2. Retrieved references to each component (via ->next->next pointers)
3. Downcasted as necessary to make sure that each sub-component was the
right type (i.e. symbol, int, list).
4. Checking that the tag (i.e. "declare") was correct.
This was all very ad-hoc and a bit ugly. Error checking had to be done
at both steps 1, 3, and 4. Most code didn't even check the tag, relying
on the caller to do so. Not all callers did.
The new pattern matching module performs the whole process in a single
straightforward function call, resulting in shorter, more readable code.
Unfortunately, MSVC does not support C99-style anonymous arrays, so the
pattern must be declared outside of the match call.
stObj->pt is null when a TFP texture is passed to st_finalize_texture,
and with the changes introduced in the above commit this resulted in a
new texture being created and the existing image being copied into it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Fixes a PT_NOT_PRESENT error cause by:
- allocating in VRAM
- emitting GART relocs to 0x17bc/0x17c0, moving the buffer
- telling the bufmgr that the buffer should be in VRAM when we use it,
but not correcting the value sent to 0x17bc/0x17c0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes a failed assertion when a renderbuffer ID that was gen'd but not
previously bound was passed to glFramebufferRenderbuffer(). Generate
the same error that NVIDIA does.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a problem when glDrawBuffers(GL_NONE). The fragment program
was writing to color output[0] but OutputsWritten was 0. That led to a
failed assertion in the Mesa->TGSI translation code.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The extension string in GLES1 contexts always advertised
GL_OES_point_sprite. Now advertisement depends on ARB_point_sprite being
enabled.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Change all OES extension strings that depend on ARB_framebuffer_object to
instead depend on EXT_framebuffer_object.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Add GL_OES_stencil8 to ES2.
Remove the following:
GL_OES_compressed_paletted_texture : ES1
GL_OES_depth32 : ES1, ES2
GL_OES_stencil1 : ES1, ES2
GL_OES_stencil4 : ES1, ES2
Mesa advertised these extensions, but did not actually support them.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Place GL, GLES1, and GLES2 extensions in a unified extension table. This
allows one to enable, disable, and query the status of GLES1 and GLES2
extensions by name.
When tested on Intel Ironlake, this patch did not alter the extension
string [as given by glGetString(GL_EXTENSIONS)] for any API.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In particular, variables cannot be redeclared invariant after being
used.
Fixes piglit test invariant-05.vert and bugzilla #29164.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
With the change to not reset baselevel, this GL_LINEAR filtering was
resulting in generating mipmaps off of the base level instead of the
next higher detail level. Fixes fbo-generatemipmap-filtering.
Reported by: Neil Roberts <neil@linux.intel.com>
The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
Update SConscripts to re-enable or add support for EGL on windows and
x11 platforms respectively. targets/egl-gdi is replaced by
targets/egl-static, where "-static" means pipe drivers and state
trackers are linked to statically by egl_gallium, and egl_gallium is a
built-in driver of libEGL. There is no more egl_gallium.dll on Windows.
This greatly improves codegen for programs with flow control by
allowing coalescing for all instructions at the top level, not just
ones that follow the last flow control in the program.
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
Fixes piglit tests glsl-1.20/compiler/qualifiers/in-01.vert and
glsl-1.20/compiler/qualifiers/out-01.vert and bugzilla #32910.
NOTE: This is a candidate for the 7.9 and 7.10 branches. This patch
also depends on the previous two commits.
When GCC encounters a division by zero in a preprocessor directive, it
generates an error. Since the GLSL spec says that the GLSL
preprocessor behaves like the C preprocessor, we should generate that
same error.
It's worth noting that I cannot find any text in the C99 spec that
says this should be an error. The only text that I can find is line 5
on page 82 (section 6.5.5 Multiplicative Opertors), which says,
"The result of the / operator is the quotient from the division of
the first operand by the second; the result of the % operator is
the remainder. In both operations, if the value of the second
operand is zero, the behavior is undefined."
Fixes 093-divide-by-zero.c test and bugzilla #32831.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
In _token_list_equal_ignoring_space(token_list_t*, token_list_t*), add
a guard that prevents dereferncing a null token list.
This fixes test src/glsl/glcpp/tests/092-redefine-macro-error-2.c and
Bugzilla #32695.
When attaching a small mipmap level to an FBO, the original gen4
didn't have the bits to support rendering to it. Instead of falling
back, just blit it to a new little miptree just for it, and let it get
revalidated into the stack later just like any other new teximage.
Bug #30365.
If we hit this path, we're level 1+ and the base level got allocated
as a single level instead of a full tree (so we don't match
intelObj->mt). This tries to recover from that so that we end up with
2 allocations and 1 validation blit (old -> new) instead of
allocations equal to number of levels and levels - 1 blits.
We don't need to worry about levels other than MaxLevel because we're
minifying -- the lower levels (higher detail) won't contribute to the
result. By changing BaseLevel, we forced hardware that doesn't
support BaseLevel != 0 to relayout the texture object.
This reverts commit 7ce6517f3a.
This reverts commit d60145d06d.
I was wrong about which generations supported baselevel adjustment --
it's just gen4, nothing earlier. This meant that i915 would have
never used the mag filter when baselevel != 0. Not a severe bug, but
not an intentional regression. I think we can fix the performance
issue another way.
Most _3DSTATE defines contain the command type, sub-type, opcode, and
sub-opcode (i.e. 0x7905). These, however, contain only the sub-opcode
(i.e. 0x05). Since they are inconsistent with the rest of the code and
nothing uses them, simply delete them.
The _3DOP and _3DCONTROL defines seemed similar, and were also unused.
From reading EXT_texture_sRGB and EXT_framebuffer_sRGB and interactions
with FBO I've found that swrast is converting the sRGB values to linear for
blending when an sRGB texture is bound as an FBO. According to the spec
and further explained in the framebuffer_sRGB spec this behaviour is not
required unless the GL_FRAMEBUFFER_SRGB is enabled and the Visual/config
exposes GL_FRAMEBUFFER_SRGB_CAPABLE_EXT.
This patch fixes swrast to use a separate Fetch call for FBOs bound to
SRGB and avoid the conversions.
v2: export _mesa_get_texture_dimensions as per Brian's comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With core mesa doing runtime API checks, GLES overlay is no longer
needed. Make --enable-gles-overlay equivalent to --enable-gles[12].
There may still be places where compile-time checks are done. They
could be fixed case by case.
This is a step forward for compatibility with really old GLX. But the
real reason for making this change now is so that we can make egl_glx a
built-in driver without having to link to libGL.
In preparation for making egl_dri2 built-in. It also handles
symbol lookup error: /usr/local/lib/egl/egl_dri2.so: undefined symbol:
_glapi_get_proc_address
more gracefully.
If you want to enable noop set GALLIUM_NOOP=1 as an env variable.
You need first to enable noop wrapping for your driver see change
to src/gallium/targets/dri-r600/ in this commit as an example.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add release function for texture_from_pixmap extension.
Some platform need to release texture image for texture_from_pixmap
extension, add this interface for those platforms.
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
There really shouldn't be any difference between the two for us.
Fixes a bug where Z16 renderbuffers would be untiled on gen6, likely
leading to hangs.
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
ARB_fbo no longer disallows mismatched width/height on attachments
(shouldn't be any problem), mixed format color attachments (we only
support 1), and L/A/LA/I color attachments (we already reject them on
965 too). It requires Gen'ed names (driver doesn't care), and adds
FramebufferTextureLayer (we don't do texture arrays). So it looks
like we're already in the position we need to be for this extension.
Bug #27468, #32381.
In general, we have to negate in immediate values we pass in because
the src1 negate field in the register description is in the bits3 slot
that the 32-bit value is loaded into, so it's ignored by the hardware.
However, the src0 negate field is in bits1, so after we'd negated the
immediate value loaded in, it would also get negated through the
register description. This broke this VP instruction in the position
calculation in civ4:
MAD TEMP[1], TEMP[1], CONST[256].zzzz, CONST[256].-y-y-y-y;
Bug #30156
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
Fix the error in uniform row calculating, it may alloc one line
more which may cause out of range on memory usage, sometimes program
aborted when free the memory.
NOTE: This is a candidate for 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <brianp@vmware.com>
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
Previously the 'STDGL invariant(all)' pragma added in GLSL 1.20 was
simply ignored by the compiler. This adds support for setting all
variable invariant.
In GLSL 1.10 and GLSL ES 1.00 the pragma is ignored, per the specs,
but a warning is generated.
Fixes piglit test glsl-invariant-pragma and bugzilla #31925.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
GLSL 1.10 and 1.20 allow any sort of sampler array indexing.
Restrictions were added in GLSL 1.30. Commit f0f2ec4d added support
for the 1.30 restrictions, but it broke some valid 1.10/1.20 shaders.
This changes the error to a warning in GLSL 1.10, GLSL 1.20, and GLSL
ES 1.00.
There are some spurious whitespace changes in this commit. I changed
the layout (and wording) of the error message so that all three cases
would be similar. The 1.10/1.20 and 1.30 text is the same. The only
difference is that one is an error, and the other is a warning. The
GLSL ES 1.00 wording is similar but not quite the same.
Fixes piglit test
spec/glsl-1.10/compiler/constant-expressions/sampler-array-index-02.frag
and bugzilla #32374.
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
The overhead of resource_create, transfer_inline_write, and resource_destroy
to upload constant data is very visible with some apps in sysprof, and
as such should be eliminated.
My approach uses a user buffer to pass a pointer to a driver. This gives
the driver the freedom it needs to take the fast path, which may differ
for each driver.
This commit addresses the same issue as Jakob's one that suballocates out
of a big constant buffer, but it also eliminates the copy to the buffer.
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
The map/unmap overhead can be significant even though there is no waiting on busy
buffers. There is simply a huge number of uploads.
This is a performance optimization for Torcs, a car racing game.
Directly include mtypes.h if a file uses a gl_context struct. This
allows future removal of headers that are not strictly necessary but
indirectly include mtypes.h for a file.
BaseLevel/MaxLevel are mostly used for two things: clamping texture
access for FBO rendering, and limiting the used mipmap levels when
incrementally loading textures. By restricting our mipmap trees to
just the current BaseLevel/MaxLevel, we caused reallocation thrashing
in the common case, for a theoretical win if someone really did want
just levels 2..4 or whatever of their texture object.
Bug #30366
This has always been ugly about our texture code -- object base/max
level vs intel object first/last level vs image level vs miptree
first/last level. We now get rid of intelObj->first_level which is
just tObj->BaseLevel, and make intelObj->_MaxLevel clearly based off
of tObj->_MaxLevel instead of duplicating its code (incorrectly, as
image->MaxLog2 only considers width/height and not depth!)
It was quite a mess by trying to do NULL renderbuffers and real
renderbuffers in the same function. This clarifies the common case of
real renderbuffers.
This makes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc
match
fbo-generatemipmap-formats GL_EXT_texture_compression_s3tc
and swrast in bad DXT1_RGBA alpha=0 handling, but it means we won't
unpack and repack someone's textures into uncompressed SARGB8 format.
It's just LUMINANCE, not LUMINANCE_ALPHA. Fixes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc assertion failure
when it tries to pack the L8 channels into LUMINANCE_ALPHA and wonders
why it's trying to do that.
From the r600 ISA:
Each ALU clause can lock up to four sets of constants
into the constant cache. Each set (one cache line) is
16 128-bit constants. These are split into two groups.
Each group can be from a different constant buffer
(out of 16 buffers). Each group of two constants consists
of either [Line] and [Line+1] or [line + loop_ctr]
and [line + loop_ctr +1].
For supporting more than 64 constants, we need to
break the code into multiple ALU clauses based
on what sets of constants are needed in that clause.
Note: This is a candidate for the 7.10 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
rc_inst_can_use_presub() wasn't checking for too many RGB sources in
Alpha instructions or too many Alpha sources in RGB instructions.
Note: This is a candidate for the 7.10 branch.
Perform this check in ast_declarator_list::hir().
From section 4.3.6 of the GLSL 1.30 spec:
"If a vertex output is a signed or unsigned integer or integer
vector, then it must be qualified with the interpolation
qualifier
flat."
Allow redeclaration of the following built-in variables with an
interpolation qualifier in language versions >= 1.30:
* gl_FrontColor
* gl_BackColor
* gl_FrontSecondaryColor
* gl_BackSecondaryColor
* gl_Color
* gl_SecondaryColor
See section 4.3.7 of the GLSL 1.30 spec.
We were looking at the current draw buffer instead to see whether the
depth/stencil combination matched. So you'd get told your framebuffer
was complete, until you bound it and went to draw and we decided that
it was incomplete.
The _ColorDrawBuffers is a piece of computed state that gets for the
current draw/read buffers at _mesa_update_state time. However, this
function actually gets used for non-current draw/read buffers when
checking if an FBO is complete from the driver's perspective. So,
instead of trying to just look at the attachment points that are
currently referenced by glDrawBuffers, look at all attachment points
to see if they're driver-supported formats. This appears to actually
be more in line with the intent of the spec, too.
Fixes a segfault in my upcoming fbo-clear-formats piglit test, and
hopefully bug #30278
Until we know how hw converts quads to polygon in beginning of
3D pipeline, for now unconditionally use last vertex convention.
Fix glean/clipFlat case.
When figuring out whether a renderbuffer should be used to set the
visual bits of an FBO, we were missing important baseformats like
GL_RED, GL_RG, and GL_LUMINANCE.
st/egl should be enabled with --enable-openvg even the driver is xlib or
osmesa. Also, GLX_DIRECT_RENDERING should not be defined because libdrm
is not checked.
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
The initial values in the grctx are 0x0000 anyway, and re-binding them
all to 0x0000 destroys some init done by the nouveau drm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This reverts commit de6fd527a5.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16.
Previously, shaders with control flow nested 17 levels deep would
cause a driver assertion or segmentation fault.
Gen6 (Sandybridge) hardware no longer has this restriction.
Fixes fd.o bug #31967.
This adds a new optional max_depth parameter (defaulting to 0) to
lower_if_to_cond_assign, and makes the pass only flatten if-statements
nested deeper than that.
By default, all if-statements will be flattened, just like before.
This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign,
to match the new naming conventions.
This gets my vote for most pointless extension of all time, I'm guessing
some driver could possibly optimise for this instead of counting it might
just get a true/false, but I'm not really sure.
need this to eventually advertise 3.3 despite its total uselessness.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rename MAPI_GLAPI_SOURCES to MAPI_UTIL_SOURCES. Rename macro
MAPI_GLAPI_CURRENT to MAPI_MODE_UTIL. Update the comments to make it
clear that mapi may be used in two ways and how.
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
No need to register function prototypes in the module now that we call
the C function pointer directly -- less LLVM objects lying around.
Limited testing with lp_test_format.
Even though a bound texture stays bound when calling set_fragment_sampler_views,
it must be assigned a new cache region depending on the occupancy of other
texture units.
This fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28800
Thanks to Álmos <aaalmosss@gmail.com> for finding the bug in the code.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
We need to keep using the pipe_get_tile_swizzle() even though there's
no swizzling because we need to explicitly pass in the surface format.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=32459
The only mismatch between the two is that we have to clear the
destination's alpha to 1.0. Fixes WOW performance on my Ironlake,
from a few frames a second to almost playable.
Before, we were going off of a couple of known (hopeful) matches
between internalFormats and the cpp of the read buffer. Instead, we
can now just look at the gl_format of the two to see if they match.
We should avoid bad blits that might have been possible before, but
also allow different internalFormats to work without having to
enumerate each one.
The blit that follows appears in the command stream so it's serialized
with previous rendering. Any queued vertices in the tnl layer were
already flushed up in mesa/main/.
Create a constant int pointer to the C function, then cast it to the
function's type. This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.
NOTE: This is a candidate for the 7.10 branch
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
This is the hack for input interactivity of frontbuffer rendering
(like we do for backbuffer at intelDRI2Flush()) by waiting for the n-2
frame to complete before starting a new one. However, for an
application doing multiple contexts or regular rebinding of a single
context, this would end up lockstepping the CPU to the GPU because
every unbind was considered the end of a frame.
Improves WOW performance on my Ironlake by 48.8% (+/- 2.3%, n=5)
Given a dispatch slot, entry_get_public returns the address of the
corresponding public entry point. There may be more than one of them.
But since they are all equivalent, it is fine to return any one of them.
With entry_get_public, the address of any public entry point can be
calculated at runtime when an assembly dispatcher is used. There is no
need to have a mapping table in such case. This omits the unnecessary
relocations from the binary.
Hidden entries are just like normal entries except that they are not
exported. Since it is not always possible to hide them, and two hidden
aliases can share the same entry, the name of hidden aliases are mangled
to '_dispatch_stub_<slot>'.
Split out function name generation from _c_decl to _c_function, and use
it everywhere. Add an optional 'export' argument to _cdecl. It is
prepended to the returned string.
Really no idea why I didn't see this before, but these values were opposite
the register spec.
this seems to fix rv530 HiZ on my laptop, will reenable in next commit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GL fragColor semantics we need to tell the pipe drivers that the fragment
shader color result is to be replicated to all bound color buffers, this
adds the basic TGSI + documentation.
v2: fix missing comma pointed out by Tilman on mesa-dev.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If 'start' is odd, render the first triangle with indices embedded
in the command stream, which adds 3 to 'start' and makes it even.
Then continue with the fast path.
This is a bandaid on the problem that if some formats were not renderable
(like luminance_alpha), st/mesa fell back to some RGBA format, so basically
some non-renderable formats were actually not used at all. This is only
a problem with hardware drivers, softpipe can render to anything.
Instead, require only RGB8/RGBA8 to be renderable.
Radeon GPUs can do this. R600 can even do render-to-texture.
Packing and extracting aren't implemented, but we shouldn't hit them (I think).
Tested with swrast, softpipe, and r300g.
Just like everywhere else, I never trust my constant uploads to
correctly put constants in the right places, even though that's so
rarely where the issue is.
It's mostly like gen4 message descriptor setup, except that the sizes
of type/control changed to be like gen5. Fixes 21 piglit cases on
gm45, including the regressions in bug #32311 from increased VS
constant buffer usage.
Fixes this GCC warning.
lp_bld_const.h: In function 'lp_build_const_int_pointer':
lp_bld_const.h:137: warning: cast from pointer to integer of different size
Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.
Fixes many cases' hang and failure in GL conformance test.
Set window_bit only when the visual id is greater than zero. Correct
visual types. Skip slow configs as they are not relevant. Finally, do
not return duplicated configs.
All single-buffered configs were ignored before to make sure
EGL_RENDER_BUFFER is settable for window surfaces. It is better to
allow single-buffered configs and set EGL_WINDOW_BIT only for
double-buffered ones. This way there can be single-buffered pixmaps.
The SNB alt-mode math does the denorm and inf reduction even for a
"raw MOV" like we do for g0 message header setup, where we are moving
values that aren't actually floats. Just use UD type, where raw MOVs
really are raw MOVs.
Fixes glxgears since c52adfc2e1, but no
piglit tests had regressed(!)
Can't get away from referencing upload buffer as after flush a vertex buffer
using the upload buffer might still be active. Likely need to simplify the
pipe_refence a bit so we don't waste so much cpu time in it.
candidates for 7.10 branch
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Instead of when we read texture tiles. Now swizzling happens after
the shadow depth compare step. This fixes the piglit glsl-fs-shadow2d*
tests (except for proj+bias because of a GLSL bug).
We need to swizzle after the shadow comparison so that the GL_DEPTH_MODE
functionality is handled properly.
This fixes all the piglit glsl-fs-shadow2d*.shader_test cases, except
for glsl-fs-shadow2dproj-bias.shader_test which fails because of a
bug in the GLSL compiler (fd.o 32395).
Note the support for non float vertex draw likely regressed need to
find what we want to do there.
candidates for 7.10 branches
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is still awful, but my ability to care about reworking the old
backend so we can just get a temporary value into a POW is awfully low
since the new backend does this all sensibly.
Fixes:
fp1-LIT test 1
fp1-LIT test 3 (case x < 0)
fp1-POW test (exponentiation)
fp-lit-mask
commit 4f106f44a32eaddb6cf3fea6ba5ee9787bff609a
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 14:06:08 2010 -0700
st/mesa: reorganize vertex program translation code
Now it looks like the fragment and geometry program code.
Also remove the serial number fields from programs. It was used to
determine when new translations were needed. Now the variant key is
used for that. And the st_program_string_notify() callback removes all
variants when the program's code is changed.
commit e12d6791c5e4bff60bb2e6c04414b1b4d1325f3e
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 13:38:12 2010 -0700
st/mesa: implement geometry shader varients
Only needed in order to support per-context gallium shaders.
commit c5751c673644808ab069259a852f24c4c0e92b9d
Author: Brian Paul <brianp@vmware.com>
Date: Sun Dec 12 15:28:57 2010 -0700
st/mesa: restore glDraw/CopyPixels using new fragment program variants
Clean up the logic for fragment programs for glDraw/CopyPixels. We now
generate fragment program variants for glDraw/CopyPixels as needed which
do texture sampling, pixel scale/bias, pixelmap lookups, etc.
commit 7b0bb99bab6547f503a0176b5c0aef1482b02c97
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 17:03:23 2010 -0700
st/mesa: checkpoint: implement fragment program variants
The fragment programs variants are per-context, as the vertex programs.
NOTE: glDrawPixels is totally broken at this point.
commit 2cc926183f957f8abac18d71276dd5bbd1f27be2
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 14:59:32 2010 -0700
st/mesa: make vertex shader variants per-context
Gallium shaders are per-context but OpenGL shaders aren't. So we need
to make a different variant for each context.
During context tear-down we need to walk over all shaders/programs and
free all variants for the context being destroyed.
The new GLSL compiler doesn't support geom shaders yet so disable the
GL_ARB_geometry_shader4 extension. Undo this when geom shaders work again.
NOTE: This is a candidate for the 7.10 branch.
This is the same as what the array dereference handler does.
Fixes piglit test glsl-link-struct-array (bugzilla #31648).
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The hardware supports zero stride just fine. This is a port
of 2af8a19831 from r300g.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The RS690 memory controller prefers things to be on a different
boundary than the discrete GPUs, we had an attempt to fix this,
but it still failed, this consolidates the stride calculation
into one place and removes the really special case check.
This fixes gnome-shell and 16 piglit tests on my rs690 system.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the driver is the last reference to libEGL.so, unloading it will
cause libEGL.so to be unmapped and give problems. Disable the unloading
for now. Still have to figure out the right timing to unload drivers.
The hardware apparently does support a zero stride, so let's use it.
This fixes missing objects in ETQW, but might also fix a ton of other
similar-looking bugs.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
If a source operand has a non-native swizzle (e.g. the KIL instruction
cannot have a swizzle other than .xyzw), the lowering pass uses one or more
MOV instructions to move the operand to an intermediate temporary with
native swizzles.
This commit fixes that the presubtract information was lost during
the lowering.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
This fixes broken rendering of trees in ETQW. The trees still disappear
for an unknown reason when they are close.
Broken since:
2ff9d4474b
r300/compiler: make lowering passes possibly use up to two less temps
NOTE: This is a candidate for the 7.10 branch.
do_assignment may apply implicit conversions to coerce the base type
of initializer to the base type of the variable being declared. Fixes
piglit test glsl-implicit-conversion-02 (bugzilla #32287). This
probably also fixes bugzilla #32273.
NOTE: This is a candidate for the 7.9 branch and the 7.10 branch.
There are exceptions to the table for depth texturing or rendering to
not-quite-supported formats thanks to the non-orthogonal component
selection for surface formats, but it's still a lot simpler.
Fixes piglit valgrind glsl-array-bounds-04 failure (FDO bug 29946).
NOTE:
This is a candidate for the 7.10 branch.
This is a candidate for the 7.9 branch.
The current state is allowed to be undefined past DrawElements et al.
Consequently omit that copying at least in the display list code.
This pays us some percents performance.
Signed-off-by: Brian Paul <brianp@vmware.com>
assert(current_save_state < MAX_META_OPS_DEPTH) did not compile.
Rename current_save_state to SaveStackDepth to be more consistent with
the style of the other fields.
VS places color attributes together so that SF unit can fetch the right
attribute according to object orientation. This fixes light issue in
mesa demo geartrain, projtex.
_mesa_meta_CopyPixels results in nested meta operations on Sandybridge.
Previoulsy the second meta operation overrides all states saved by the
first meta function.
When the application is not linked to any libGL*.so, loading st_GL.so
would give
/usr/local/lib/egl/st_GL.so: undefined symbol: _glapi_tls_Context
In that case, load libGL.so and try again. This works because
util_dl_open loads with RTLD_GLOBAL.
Fix "clear" OpenGL ES 1.1 demo.
Fixes piglit glx-shader-sharing crash.
When shaders are shared by multiple contexts, the shader's draw context
pointer may point to a previously destroyed context. Dereferencing the
context pointer will lead to a crash.
In this case, simply removing the flushing code avoids the crash (the
exec and sse shader paths don't flush here either).
There's a deeper issue here, however, that needs examination. Shaders
should not keep pointers to contexts since contexts might get destroyed
at any time.
NOTE: This is a candidate for the 7.10 branch (after this has been
tested for a while).
Currently we only unroll loops with conditional breaks at the end, which is
the form that lower_jumps generates.
However, if breaks are not lowered, they tend to appear at the beginning, so
add support for a conditional break anywhere.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Found this bug by code inspection. Based off the comments just before
this code, the intent is to find whether the break exists in the "then"
branch or the "else" branch. However, the code actually looked at the
last instruction in the "then" branch twice.
If you used a constant array index to access the matrix, we'd flag a
bunch of wrong inputs/outputs as being used because the index was
multiplied by matrix columns and the actual used index was left out.
Fixes glsl-mat-attribute.
Fixes this GCC warning.
brw_fs.cpp: In function 'brw_reg brw_reg_from_fs_reg(fs_reg*)':
brw_fs.cpp:3255: warning: 'brw_reg' may be used uninitialized in this function
Allow important performance increase by doing hw specific implementation
of the upload manager helper. Drop the range flushing that is not hit with
this code (and wasn't with previous neither). Performance improvement are
mostly visible on slow CPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
r600g is up to a point where all small CPU cycle matter and pb* turn
high on profile. It's mostly because pb try to be generic and thus
trigger unecessary check for r600g driver. To avoid having too much
abstraction & too much depth in the call embedded everythings into
r600_bo. Make code simpler & faster. The performance win highly depend
on the CPU & application considered being more important on slower CPU
and marginal/unoticeable on faster one.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This eases the gen6 implementation, which can only handle up to 32
registers of constants, while likely not penalizing real apps using
reladdr since all of those I've seen also end up hitting the pull
constant buffer. On gen6, the constant map means that simple NV VPs
fit under the 32-reg limit and now succeed. Fixes around 10 testcases.
Raise error if a sampler array is indexed with a non-constant expression.
From section 4.1.7 of the GLSL 1.30 spec:
"Samplers aggregated into arrays within a shader (using square
brackets [ ]) can only be indexed with integral constant
expressions [...]."
System values are shader inputs which don't necessarily change from
vertex to vertex or fragment to fragment. gl_InstanceID and
gl_FrontFacing are examples.
Attribute 0 has no special meaning in GLES2. Add VertexAttrib4f_nopos
for that purpose and make _es_VertexAttrib* call the new function.
Rename _vbo_* to _es_* to avoid confusion. These functions are only
used by GLES, and now some of them (_es_VertexAttrib*) even behave
differently than vbo_VertexAttrib*.
CMP may now use two less temps, other non-native instructions may end up
using one less temp, except for SIN/COS/SCS, which I am leaving unchanged
for now.
This may reduce register pressure inside loops, because the register
allocator doesn't do a very good job there.
That's what I get for not running piglit before pushing.
Don't try to patch types of unsized arrays when linking fails.
Don't try to patch types of unsized arrays that are shared between
shader stages.
Fixes piglit test case glsl-vec-array (bugzilla #31908).
NOTE: This bug does not affect 7.9, but I think this patch is a
candiate for the 7.9 branch anyway.
Types of declared variables and their initializer must match excatly
except for unsized arrays. Previously the type inherritance for
unsized arrays happened implicitly in the emitted assignment.
However, this assignment is never emitted for uniforms. Now that type
is explicitly copied unconditionally.
Fixes piglit test array-compare-04.vert (bugzilla #32035) and
glsl-array-uniform-length (bugzilla #31985).
NOTE: This is a candidate for the 7.9 branch.
With the change of extended math from having the arguments moved into
mrfs and handed off through message passing to being directly hooked
up to the EU, it looks like the piece for doing source modifiers
(negate and abs) was left out.
Fixes:
fog-modes
glean/fp1-ARB_fog_exp test
glean/fp1-ARB_fog_exp2 test
glean/fp1-Computed fog exp test
glean/fp1-Computed fog exp2 test
ext_fog_coord-modes
Previously the code only handled scalars and vectors. This new code
is modeled somewhat after similar code in ir_to_mesa.
Reviewed-by: Eric Anholt <eric@anholt.net>
R6XX GPU doesn't like to have two partial flush writting
back to memory in row without a prior flush of the pipeline.
Add PS_PARTIAL_FLUSH to flush all work between the CP and
the ES, GS, VS, PS shaders.
Thanks a lot to Alban Browaeys (prahal on irc) for investigating
this issue.
Signed-off-by: Alban Browaeys <prahal@yahoo.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
gen6 builtin RSQ apparently clamps negative values to 0 instead of
returning the RSQ of the absolute value like ARB_fragment_program
desires and pre-gen6 apparently does.
Fixes:
glean/fp1-RSQ test 2 (reciprocal square root of negative value)
glean/vp1-RSQ test 2 (reciprocal square root of negative value)
Fixes this GCC warning.
r200_maos_arrays.c: In function 'r200EmitArrays':
r200_maos_arrays.c:113: warning: 'emitsize' may be used uninitialized in this function
It's not always possible to preprocess the content of 3D_LOAD_VBPNTR
in a command buffer, because the offset to all vertex buffers (which
the packet depends on) is derived from the "start" parameter of draw_arrays
and the "indexBias" parameter of draw_elements, but we can at least lazily
make a command buffer for the case when offset == 0, which should occur
most of the time.
The original lazy built-in importing patch did not add the newly created
function to the symbol table, nor actually emit it into the IR stream.
Adding it to the symbol table is non-trivial since importing occurs when
generating some ir_call in a nested scope. A new add_global_function
method, backed by new symbol_table code in the previous patch, handles
this.
Fixes bug #32030.
Avoid rebuilding constant shader state at each draw call,
factor out spi update that might change at each draw call.
Best would be to update spi only when revealent states
change (likely only flat shading & sprite point).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Vertex elements change are less frequent than draw call, those to
avoid rebuilding fetch shader to often build the fetch shader along
vertex elements. This also allow to move vertex buffer setup out
of draw path and make update to it less frequent.
Shader update can still be improved to only update SPI regs (based
on some rasterizer state like flat shading or point sprite ...).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
More than 1023 temporaries were being used for a Cinebench shader before
doing temporary optimization, causing the index value to wrap around to
-1024.
In st_finalize_texture() we were looking at the st_texture_object::
lastLevel field instead of the pipe_resource::last_level field to
determine which resource to store the mipmap in.
Then, in st_generate_mipmap() we need to call st_finalize_texture() to
make sure the destination resource is properly allocated.
These changes fix the broken piglit fbo-generatemipmap-formats test.
Use --cppflags instead of --cflags so that we get the -I and -D flags
we want, but not compiler options like -O3.
A similar change should probably be made for autoconf.
When X or Y or Z is close to W the outcome of the floating point clip
test comparision may be different between the C and x86 asm paths.
That's OK; don't report an error.
See fd.o bug 32093
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
Since the 8-wide first-quarter and 16-wide first-half have the same
bit encoding, we now need to track "do you want instruction
compression" in the compile state.
The FS backend is fine with register level granularity. But for the
brw_wm_emit.c backend, it expects pairs of regs to be used for the
constants, because the whole world is pairs of regs. If an odd number
got used, we went looking for interpolation in the wrong place.
In the SF and brw_fs.cpp fixes to set up interpolation sanely on gen6,
the setup for 16-wide interpolation was left behind. This brings
relative sanity to that path too.
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
The preprocessor magic in mapi was nothing but obfuscation. Rewrite
mapi_abi.py to generate real C code.
This commit removes the hack added in
43121f2086.
Need to check the required primitive type for GS on Sandybridge,
and when GS is disabled, the new state has to be issued too, instead
of only updating URB state with no GS entry, that caused hang on Sandybridge.
This fixes hang issue during conformance suite testing.
This fixes endless vertex shader recompilations in find_translated_vp
if the shader contains an edge flag output.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by Brian Paul <brianp@vmware.com>
Finished up by Marek Olšák.
We can set the constant space to use a different area per-call to the shader,
we can avoid flushing the PVS as often as we do by spreading out the constants
across the whole constant space.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
It appears to be a constant buffer index (in case there are more constant
buffers explicitly used by a shader), i.e. something that Gallium currently
does not use. We treated it incorrectly as the offset to a constant buffer.
Sometimes I'm on the train and want to just read what's generated
under INTEL_DEBUG=vs,wm for some code on another generation. Or, for
the next gen enablement we'll want to dump aub files before we have
the actual hardware. This will let us do that.
rgb_src_factor and rgb_dst_factor should be PIPE_BLENDFACTOR_ONE for
VG_BLEND_SRC_IN and VG_BLEND_DST_IN respectively. VG_BLEND_SRC_OVER can
be supported only when the fb has no alpha channel. VG_BLEND_DST_OVER
and VG_BLEND_ADDITIVE have to be supported with a shader.
Note that Porter-Duff blending rules assume premultiplied alpha.
Convert color values to and back from premultiplied form for blending.
Finally the rendering result of the blend demo looks much closer to that
of the reference implementation.
Drawing an image in VG_DRAW_IMAGE_STENCIL mode produces per-channel
alpha for use in blending. Add a new shader stage to produce and save
it in TEMP[1].
For other modes that do not need per-channel alpha, the stage does
MOV TEMP[1], TEMP[0].wwww
Add a helper function, blend_generic, that supports all blend modes and
per-channel alpha. Make other blend generators a wrapper to it.
Both the old and new code expects premultiplied colors, yet the input is
non-premultiplied. Per-channel alpha is also not used for stencil
image. They still need to be fixed.
In find_value() check if we've hit the 0th/invalid entry before checking
if the pname matches.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=31987
NOTE: This is a candidate for the 7.9 branch.
If querying the default/window-system FBO's attachment type, return
GL_FRAMEBUFFER_DEFAULT (per the GL_ARB_framebuffer_object spec).
See http://bugs.freedesktop.org/show_bug.cgi?id=31947
NOTE: This is a candidate for the 7.9 branch.
Return 0 instead of generating an error.
See http://bugs.freedesktop.org/show_bug.cgi?id=30993
Note that piglit fbo-getframebufferattachmentparameter-01 still does
not pass. But Mesa behaves the same as the NVIDIA driver in this case.
Perhaps the test is incorrect.
NOTE: This is a candidate for the 7.9 branch.
It was done in path-to-polygon conversion. That meant that the
results were invalidated when the transformation was modified, and CPU
had to recreate the vertex buffer with new vertices. It could be a
performance hit for apps that animate.
gl_FragCoord.y needs to be flipped upside down if a FBO is bound.
This fixes:
- piglit/fbo-fragcoord
- https://bugs.freedesktop.org/show_bug.cgi?id=29420
Here I add a new program state STATE_FB_WPOS_Y_TRANSFORM, which is set based
on whether a FBO is bound. The state contains a pair of transformations.
It can be either (XY=identity, ZW=transformY) if a FBO is bound,
or (XY=transformY, ZW=identity) otherwise, where identity = (1, 0),
transformY = (-1, height-1).
A classic driver (or st/mesa) may, based on some other state, choose whether
to use XY or ZW, thus negate the conditional "if (is a FBO bound) ...".
The reason for this is that a Gallium driver is allowed to only support WPOS
relative to either the lower left or the upper left corner, so we must flip
the Y axis accordingly again. (the "invert" parameter in emit_wpos_inversion)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fixes nasty bug where some parts of the code didn't define WIN32_THREADS
and were using the integer mutex implementation, causing even confusion
to the debuggers.
And there is little interest of other thread implemenation on Win32
besides Win32 threads.
This allows 16K x 16K 2D textures, for example, but we don't want to
allow that for 3D textures. The new gl_constants::MaxTextureMBytes
field is used to prevent allocating too large of texture image.
This allows a 16K x 32 x 32 3D texture, for example, but prevents 16K^3.
Drivers can override this limit. The default is currently 1GB.
Apps should use the proxy texture mechanism to determine the actual
max texture size.
resources have a array_size parameter now.
get_tex_surface and tex_surface_destroy have been renamed to create_surface
and surface_destroy and moved to context, similar to sampler views (and
create_surface now uses a template just like create_sampler_view). Surfaces
now really should only be used for rendering. In particular they shouldn't be
used as some kind of 2d abstraction for sharing a texture. offset/layout fields
don't make sense any longer and have been removed, width/height should go too.
surfaces and sampler views now specify a layer range (for texture resources),
layer is either array slice, depth slice or cube face.
pipe_subresource is gone array slices (or cube faces) are now treated the same
as depth slices in transfers etc. (that is, they use the z coord of the
respective functions).
Squashed commit of the following:
commit a45bd509014743d21a532194d7b658a1aeb00cb7
Merge: 1aeca28 32e1e59
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 04:32:06 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_resource_texture.c
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/drivers/i915/i915_surface.c
commit 1aeca287a827f29206078fa1204715a477072c08
Merge: 912f042 6f7c8c3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 00:37:11 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/state_trackers/vega/api_filters.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/mask.c
src/gallium/state_trackers/vega/paint.c
src/gallium/state_trackers/vega/renderer.c
src/gallium/state_trackers/vega/st_inlines.h
src/gallium/state_trackers/vega/vg_context.c
src/gallium/state_trackers/vega/vg_manager.c
commit 912f042e1d439de17b36be9a740358c876fcd144
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 03:01:55 2010 +0100
gallium: even more compile fixes after merge
commit 6fc95a58866d2a291def333608ba9c10c3f07e82
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 00:22:26 2010 +0100
gallium: some fixes after merge
commit a8d5ffaeb5397ffaa12fb422e4e7efdf0494c3e2
Merge: f7a202f 2da02e7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 30 23:41:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/vg_context.c
commit f7a202fde2aea2ec78ef58830f945a5e214e56ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 19:19:32 2010 +0100
gallium: even more fixes/cleanups after merge
commit 6895a7f969ed7f9fa8ceb788810df8dbcf04c4c9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 03:07:36 2010 +0100
gallium: more compile fixes after merge
commit af0501a5103b9756bc4d79167bd81051ad6e8670
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 19:24:45 2010 +0100
gallium: lots of compile fixes after merge
commit 0332003c2feb60f2a20e9a40368180c4ecd33e6b
Merge: 26c6346 b6b91fa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 17:02:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_sample.c
src/gallium/auxiliary/util/u_blit.c
src/gallium/auxiliary/util/u_blitter.c
src/gallium/auxiliary/util/u_inlines.h
src/gallium/auxiliary/util/u_surface.c
src/gallium/auxiliary/util/u_surfaces.c
src/gallium/docs/source/context.rst
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/nv50/nv50_state_validate.c
src/gallium/drivers/nvfx/nv04_surface_2d.c
src/gallium/drivers/nvfx/nv04_surface_2d.h
src/gallium/drivers/nvfx/nvfx_buffer.c
src/gallium/drivers/nvfx/nvfx_miptree.c
src/gallium/drivers/nvfx/nvfx_resource.c
src/gallium/drivers/nvfx/nvfx_resource.h
src/gallium/drivers/nvfx/nvfx_state_fb.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_buffer.c
src/gallium/drivers/r600/r600_context.h
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/include/pipe/p_defines.h
src/gallium/state_trackers/egl/common/egl_g3d_api.c
src/gallium/state_trackers/glx/xlib/xm_st.c
src/gallium/targets/libgl-gdi/gdi_softpipe_winsys.c
src/gallium/targets/libgl-gdi/libgl_gdi.c
src/gallium/tests/graw/tri.c
src/mesa/state_tracker/st_cb_blit.c
src/mesa/state_tracker/st_cb_readpixels.c
commit 26c6346b385929fba94775f33838d0cceaaf1127
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:37:21 2010 +0200
fix more merge breakage
commit b30d87c6025eefe7f6979ffa8e369bbe755d5c1d
Merge: 9461bf3 1f1928d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:15:38 2010 +0200
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_rast_priv.h
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_screen_buffer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_texture.h
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/drivers/r600/r600_texture.h
src/gallium/state_trackers/dri/common/dri1_helper.c
src/gallium/state_trackers/dri/sw/drisw.c
src/gallium/state_trackers/xorg/xorg_exa.c
commit 9461bf3cfb647d2301364ae29fc3084fff52862a
Merge: 17492d7 0eaccb3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 15 20:13:45 2010 +0200
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_blitter.c
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_surface.c
src/gallium/drivers/r300/r300_render.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/tests/trivial/quad-tex.c
commit 17492d705e7b7f607b71db045c3bf344cb6842b3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Jun 18 10:58:08 2010 +0100
gallium: rename element_offset/width fields in views to first/last_element
This is much more consistent with the other fields used there
(first/last level, first/last layer).
Actually thinking about removing the ugly union/structs again and
rename first/last_layer to something even more generic which could also
be used for buffers (like first/last_member) without inducing headaches.
commit 1b717a289299f942de834dcccafbab91361e20ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 14:46:09 2010 +0100
gallium: remove PIPE_SURFACE_LAYOUT_LINEAR definition
This was only used by the layout field of pipe_surface, but this
driver internal stuff is gone so there's no need for this driver independent
layout definition neither.
commit 10cb644b31b3ef47e6c7b55e514ad24bb891fac4
Merge: 5691db9 c85971d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 12:20:41 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/docs/source/glossary.rst
src/gallium/tests/graw/fs-test.c
src/gallium/tests/graw/gs-test.c
commit 5691db960ca3d525ce7d6c32d9c7a28f5e907f3b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 11:29:03 2010 +0100
st/wgl: fix interface changes bugs
commit 2303ec32143d363b46e59e4b7c91b0ebd34a16b2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:32 2010 +0100
gallium: adapt code to interface changes...
commit dcae4f586f0d0885b72674a355e5d56d47afe77d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:05 2010 +0100
gallium: separate depth0 and array_size in the resource itself.
These fields are still mutually exclusive (since no 3d array textures exist)
but it ultimately seemed to error-prone to adapt all code accept the new
meaning of depth0 (drivers stick that into hardware regs, calculate mipmap
sizes etc.). And it isn't really cleaner anyway.
So, array textures will have depth0 of 1, but instead use array_size,
3D textures will continue to use depth0 (and have array_size of 1). Cube
maps also will use array_size to indicate their 6 faces, but since all drivers
should just be fine by inferring this themselves from the fact it's a cube map
as they always used to nothing should break.
commit 621737a638d187d208712250fc19a91978fdea6b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:47:38 2010 +0100
gallium: adapt code to interface changes
There are still usages of pipe_surface where pipe_resource should be used,
which should eventually be fixed.
commit 2d17f5efe166b2c3d51957c76294165ab30b8ae2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:46:14 2010 +0100
gallium: more interface changes
In particular to enable usage of buffers in views, and ability to use a
different pipe_format in pipe_surface.
Get rid of layout and offset parameter in pipe_surface - the former was
not used in any (public) code anyway, and the latter should either be computed
on-demand or driver can use subclass of pipe_surface.
Also make create_surface() use a template to be more consistent with
other functions.
commit 71f885ee16aa5cf2742c44bfaf0dc5b8734b9901
Merge: 3232d11 8ad410d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 14:19:51 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_box.h
src/gallium/drivers/nv50/nv50_surface.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/include/pipe/p_state.h
commit 3232d11fe3ebf7686286013c357b404714853984
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:40:04 2010 +0100
mesa/st: adapt to interface changes
still need to fix pipe_surface sharing
(as that is now per-context).
Also broken is depth0 handling - half the code assumes
this is also used for array textures (and hence by extension
of that cube maps would have depth 6), half the code does not...
commit f433b7f7f552720e5eade0b4078db94590ee85e1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:35:52 2010 +0100
gallium: fix a couple of bugs in interface chnage fixes
commit 818366b28ea18f514dc791646248ce6f08d9bbcf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:42:11 2010 +0200
targets: adapt to interface changes
Yes even that needs adjustments...
commit 66c511ab1682c9918e0200902039247793acb41e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:41:13 2010 +0200
tests: adapt to interface changes
Everything needs to be fixed :-(.
commit 6b494635d9dbdaa7605bc87b1ebf682b138c5808
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:39:50 2010 +0200
st: adapt non-rendering state trackers to interface changes
might not be quite right in all places, but they really don't want
to use pipe_surface.
commit 00c4289a35d86e4fe85919ec32aa9f5ffe69d16d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:38:48 2010 +0200
winsys: adapt to interface changes
commit 39d858554dc9ed5dbc795626fec3ef9deae552a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:54 2010 +0200
st/python: adapt to interface changes
don't think that will work, sorry.
commit 6e9336bc49b32139cec4e683857d0958000e15e3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:07 2010 +0200
st/vega: adapt to interface changes
commit e07f2ae9aaf8842757d5d50865f76f8276245e11
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:25:56 2010 +0200
st/xorg: adapt to interface changes
commit 05531c10a74a4358103e30d3b38a5eceb25c947f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:53 2010 +0200
nv50: adapt to interface changes
commit 97704f388d7042121c6d496ba8c003afa3ea2bf3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:45 2010 +0200
nvfx: adapt to interface changes
commit a8a9c93d703af6e8f5c12e1cea9ec665add1abe0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:01 2010 +0200
i965g: adapt to interface changes
commit 0dde209589872d20cc34ed0b237e3ed7ae0e2de3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:22:38 2010 +0200
i915g: adapt to interface changes
commit 5cac9beede69d12f5807ee1a247a4c864652799e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:58 2010 +0200
svga: adapt to interface changes
resource_copy_region still looking fishy.
Was not very suited to unified zslice/face approach...
commit 08b5a6af4b963a3e4c75fc336bf6c0772dce5150
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:01 2010 +0200
rbug: adapt to interface changes
Not sure if that won't need changes elsewhere?
commit c9fd24b1f586bcef2e0a6e76b68e40fca3408964
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:31 2010 +0200
trace: adapt to interface changes
commit ed84e010afc5635a1a47390b32247a266f65b8d1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:21 2010 +0200
failover: adapt to interface changes
commit a1d4b4a293da933276908e3393435ec4b43cf201
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:12 2010 +0200
identity: adapt to interface changes
commit a8dd73e2c56c7d95ffcf174408f38f4f35fd2f4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:55 2010 +0200
softpipe: adapt to interface changes
commit a886085893e461e8473978e8206ec2312b7077ff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:44 2010 +0200
llvmpipe: adapt to interface changes
commit 70523f6d567d8b7cfda682157556370fd3c43460
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:14 2010 +0200
r600g: adapt to interface changes
commit 3f4bc72bd80994865eb9f6b8dfd11e2b97060d19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:05 2010 +0200
r300g: adapt to interface changes
commit 5d353b55ee14db0ac0515b5a3cf9389430832c19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:17:37 2010 +0200
cell: adapt to interface changes
not even compile tested
commit cf5d03601322c2dcb12d7a9c2f1745e2b2a35eb4
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:14:59 2010 +0200
util: adapt to interface changes
amazing how much code changes just due to some subtle interface changes?
commit dc98d713c6937c0e177fc2caf23020402cc7ea7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:12:40 2010 +0200
gallium: more interface fail, docs
this also changes flush_frontbuffer to use a pipe_resource instead of
a pipe_surface - pipe_surface is not meant to be (or at least no longer)
an abstraction for standalone 2d images which get passed around.
(This has also implications for the non-rendering state-trackers.)
commit 08436d27ddd59857c22827c609b692aa0c407b7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 10 17:42:52 2010 +0200
gallium: fix array texture interface changes bugs, docs
commit 4a4d927609b62b4d7fb9dffa35158afe282f277b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 3 22:02:44 2010 +0200
gallium: interface changes for array textures and related cleanups
This patch introduces array textures to gallium (note they are not immediately
usable without the associated changes to the shader side).
Also, this abandons pipe_subresource in favor of using level and layer
parameters since the distinction between several faces (which was part of
pipe_subresource for cube textures) and several z slices (which were not part
of pipe_subresource but instead part of pipe_box where appropriate for 3d
textures) is gone at the resource level.
Textures, be it array, cube, or 3d, now use a "unified" set of parameters,
there is no distinction between array members, cube faces, or 3d zslices.
This is unlike d3d10, whose subresource index includes layer information for
array textures, but which considers all z slices of a 3d texture to be part
of the same subresource.
In contrast to d3d10, OpenGL though reuses old 2d and 3d function entry points
for 1d and 2d array textures, respectively, which also implies that for instance
it is possible to specify all layers of a 2d array texture at once (note that
this is not possible for cube maps, which use the 2d entry points, although
it is possible for cube map arrays, which aren't supported yet in gallium).
This should possibly make drivers a bit simpler, and also get rid of mutually
exclusive parameters in some functions (as z and face were exclusive), one
potential downside would be that 3d array textures could not easily be supported
without reverting this, but those are nowhere to be seen.
Also along with adjusting to new parameters, rename get_tex_surface /
tex_surface_destroy to create_surface / surface_destroy and move them from
screen to context, which reflects much better what those do (they are analogous
to create_sampler_view / sampler_view_destroy).
PIPE_CAP_ARRAY_TEXTURES is used to indicate if a driver supports all of this
functionality (that is, both sampling from array texture as well as use a range
of layers as a render target, with selecting the layer from the geometry shader).
Malloc likes to reuse old address as soon as possible this would cause the
new vbo buffer to get the same address as the old. So make sure we set it to
NULL when we allocate a new one. This fixes ipers which will fill up a couple
of VBO buffers per frame.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Byte offsets simply don't work with tiled render targets when using
tiling bits. Luckily we can cox the hw into doing the right thing
with the DRAWING_RECT command by disabling the drawing rect offset
for the depth buffer.
Minor fixes by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Tiling is rather fragile in general and results in pure blackness when
unlucky. Hence add a new option to disable tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
libdrm-intel can refuse to tile buffers for various reasons. For
potentially tiled buffers the stride is therefore only known after
the iws->buffer_create_tiled call. Unconditionally rounding up to whatever
tiling requires wastes space, so rework the code to not use tex->stride
in the layout code.
Luckily only the mimap/face offset calculation uses it which can easily
be solved by storing an (x, y) coordinate pair. Furthermore this will
be usefull later for properly supporting rendering into the different
levels of tiled mipmap textures.
v2: switch to nblocks(x|y): More in line with gallium and better
suited for rendering into mipmap textures.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This is needed to properly implement tiling flags. And the gem
implemention fo buffer_from_handle already calls get_tiling, so
it's for free.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This way relaxed fencing is handled by libdrm. And buffers _can't_
ever change their tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Different kernels have different restrictions for tiled buffers.
Hence use the libdrm abstraction to calculate the necessary
stride and height alignment requirements.
Not yet used.
v2: Incorporate review comments from Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It's unnecessary. The kernel gem ignores it totally and we can't
run on the old userspace fake bo manager due to lack of dri2.
Also drop the redundant name string from the sw winsys as suggested
by Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
By not doing so, the uniform contents of
glsl-uniform-non-uniform-array-compare.shader_test was getting thrown
out since nobody was recorded as dereferencing the array.
The new lower_discard and opt_discard_simplification passes should
handle all the necessary transformations, so lower_jumps doesn't need to
support it.
Also, lower_jumps incorrectly handled conditional discards - it would
unconditionally truncate all code after the discard. Rather than fixing
the bug, simply remove the code.
NOTE: This is a candidate for the 7.9 branch.
This should allow lower_if_to_cond_assign to work in the presence of
discards, fixing bug #31690 and likely #31983.
NOTE: This is a candidate for the 7.9 branch.
There were a few places where we were using the wrong opcodes
on evergreen. arl still needs to be fixed on evergreen; see
r600g for reference.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
As the blend texture, a drawing surface mask is used when masking is
enabled. It should be created as needed.
s/alpha_mask/surface_mask/ to follow OpenVG 1.1 naming.
Depending on whether vgDrawPath(mode), vgDrawImage, or vgDrawGlyph[s] is
called, different paint-to-user and user-to-surface matrices should be
used to derive the sample points for the paint.
This fixes "paint" demo.
Divide bits of VegaShaderType into 6 groups: paint, image, mask, fill,
premultiply, and bw. Each group represents a stage. At most one shader
from each group will be selected when constructing the final fragment
shader.
Optional features such as auth-hinting are not implemented. There is no
anti-aliasing, and no effort is done to keep the glyph origin integral.
So the text quality is poor.
With this commit, the pipe states are entirely managed by the renderer.
The rest of the code interfaces with the renderer instead of
manipulating the states directly.
vg_manager_validate_framebuffer should mark the fb dirty and have
vg_validate_state call cso_set_framebuffer. Rename VIEWPORT_DIRTY to
FRAMEBUFFER_DIRTY.
The states are designated for polygon filling. Polygon filling is a
two-pass process utilizing the stencil buffer. polygon_fill and
polygon_array_fill functions are updated to make use of the state.
This state provides the ability to clear rectangles of the framebuffer
to the specified color, honoring scissoring. vegaClear is updated to
make use of the state.
This state provides glDrawTex-like function. It can be used for
vgSetPixels.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_surface to use DRAWTEX or COPY state internally
depending on whether the destination is the framebuffer.
Renderer states are high-level states to perform specific tasks. The
renderer is initially in INIT state. In that state, the renderer is
used for OpenVG pipeline.
This commit adds a new COPY state to the renderer. The state is used
for copying between two pipe resources using textured drawing. It can
be used for vgCopyImage, for example.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_texture to use the COPY state internally.
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc. All data structures built with
this object can be periodically freed during a "garbage collection"
operation.
The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
src/gallium/drivers/llvmpipe/lp_state_setup.c
I made the texwrap test be more thorough and realized that this driver code
had not been quite right. This commit fixes the border color for depth
textures, compressed textures, and 16-bits-per-channel textures
with up to 2 channels (R16, RG16).
NOTE: This is a candidate for the 7.9 branch.
Previously, IR for a linked shader was allocated directly out of the
gl_shader object - meaning all of it lived as long as the shader.
Now, IR is allocated out of a temporary context, and any -live- IR is
reparented/stolen to (effectively) the gl_shader. Any remaining IR can
be freed.
NOTE: This is a candidate for the 7.9 branch.
This makes a very simple 1.30 shader go from 196k of memory to 9k.
NOTE: This -may- be a candidate for the 7.9 branch, as the benefit is
substantial. However, it's not a simple change, so it may be wiser to
wait for 7.10.
We were trying to emit a single ir_expression to compare the whole
thing. The backends (ir_to_mesa.cpp and brw_fs.cpp so far) expected
ir_binop_any_nequal or ir_binop_all_equal to apply to at most a vector
(with matrices broken down by the lowering pass). Break them down to
a bunch of ORed or ANDed any_nequals/all_equals.
Fixes:
glsl-array-compare
glsl-array-compare-02
glsl-fs-struct-equal
glsl-fs-struct-notequal
Bug #31909
This doesn't cover all expressions or all operand types, but it will
complain if you overreach and it allows for much greater slack on the
programmer's part.
This code was for the old GLcore build of the software rasteriser. The
X server switched to a DRI driver for software indirect GLX long ago.
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are also some u_simple_shaders changes.
On r300, the TGSI_SEMANTIC_COLOR varying is a fixed-point number clamped
to the range [0,1] and limited to 12 bits of precision. Therefore we can't
use it for passing through a clear color in order to clear high precision
texture formats.
This also makes u_blitter use only one vertex shader instead of two.
Since the viewport is not updated on RandR12 mode switches anymore,
clipping to viewport may incorrectly clip away the video.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This may help paint the colorkey before overlay updates in some
situations where the app paints the color key (mainly xine).
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The compiler seriously needs a cleanup as far as the arrangement of functions
is concerned. It's hard to know whether some function was implemented or not
because there are so many places to search in and it can be anywhere and
named anyhow.
If nvfx_framebuffer prepare and validate were called successively with
fb->zsbuf not NULL and then NULL, nvfx->hw_zeta would contain garbage and
this would cause failures in nvfx_framebuffer_relocate/OUT_RELOC(hw_zeta).
This was triggered by piglit/texwrap 2D GL_DEPTH_COMPONENT24 and caused
first a 'write to user buffer!!' error in libdrm and then worse things.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Currently, the standalone compiler tries to do function inlining before
linking shaders (including linking against the built-in functions).
This resulted in the built-in function _prototypes_ being inlined rather
than the actual function definition.
This is only known to fix a bug in the standalone compiler; most
programs should be unaffected. Still, it seems like a good idea.
NOTE: This is a candidate for the 7.9 branch.
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:222: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:221: warning: ISO C90 forbids mixed declarations and code
The alpha mask is addressed with unnormalized coordinates in the
fragment shader. It should have the same orientation as the surface
does.
This fixes "mask" OpenVG demo.
Previously, the get table listed all three as having custom locations,
yet find_custom_value did not have cases to handle them.
MAX_VARYING_VECTORS does not need a custom location since MaxVaryings is
already stored as float[4] (or vec4). MaxUniformComponents is stored as
the number of floats, however, so a custom implementation that divides
by 4 is necessary.
Fixes bugs.freedesktop.org #31495.
This fixes incorrect behaviour when the stencil clear value exceeds
the size of the stencil buffer, for example, when set with:
glClearStencil (~1); /* Set a bit pattern of 111...11111110 */
glClear (GL_STENCIL_BUFFER_BIT);
The clear value needs to be masked by the value 2^m - 1, where m is the
number of bits in the stencil buffer. Previously, we passed the value
masked with 0x7fffffff to _mesa_StencilFuncSeparate which then clamps,
NOT masks the value to the range 0 to 2^m - 1.
The result would be clearing the stencil buffer to 0xff, rather than 0xfe.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure regions are properly updated and that the colorkey painting is
flushed before we update the HW overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is needed to properly sync with host side rendering. For example,
make sure we flush colorkey painting before updating the overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When a context was made current to another surface, the old code did
this
dri2_dpy->core->bindContext(cctx, ddraw, rdraw);
dri2_dpy->core->unbindContext(old_cctx);
and there will be no current context due to the second line.
unbindContext should be called only when bindContext is not. This fixes
a regression since d19afc57. Thanks to Neil Roberts for noticing the
issue and creating a test case.
(Marek: added the initializion of "vec" in the default statement)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This fixes piglit/glsl-vs-main-return and glsl-fs-main-return for the drivers
which don't support RET (i915g, r300g, r600g, svga).
ir_to_mesa does not currently generate subroutines, but it's a matter of time
till it's added. It would then break all the drivers which don't implement
them, so this CAP makes sense.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
When the result of the alpha instruction is being replicated to the RGB
destination register, we do not need to use alpha's destination register.
This fixes an invalid "Too many hardware temporaries used" error in
the case where a transcendent operation writes to a temporary register
greater than max_temp_regs.
NOTE: This is a candidate for the 7.9 branch.
This fixes an invalid "Too many hardware temporaries used" error in the
case where a source reads from a temporary register with an index greater
than max_temp_regs and then the source is marked as unused before the
register allocation pass.
NOTE: This is a candidate for the 7.9 branch.
Reads of registers that where not written to within the same block were
not being tracked. So in a situations like this:
0: IF
1: ADD t0, t1, t2
2: MOV t2, t1
Instruction 2 didn't know that instruction 1 read from t2, so
in some cases instruction 2 was being scheduled before instruction 1.
NOTE: This is a candidate for the 7.9 branch.
Gallium drivers pass all piglit tests for the two (there are 12 tests
for separate_shader_objects and 5 tests for explicit_attrib_location),
and I was told the extensions don't need any driver-specific code.
I made them dependent on PIPE_CAP_GLSL.
Signed-off-by: Brian Paul <brianp@vmware.com>
The drm winsys only ever handles one gem memory manager. Rip out
the unnecessary complication.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Not using the gtt is considered harmful for performance. And for
partial uploads there's always drm_intel_bo_subdata.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It looks like this was meant to facilitate unfenced access to textures/
color/renderbuffers. It's totally incomplete and fundamentally broken
on a few levels:
- broken: The kernel needs to about every tiled bo to fix up bit17
swizzling on swap-in.
- unflexible: fenced/unfenced relocs from execbuffer2 do the same, much
simpler.
- unneeded: with relaxed fencing tiled gem bos are as memory-efficient
as this trick.
Hence kill it.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Fix a crash when the subrectangle is not inside the fb. Fix wrong
pipe transfer when sx > 0 or sy + height != fb->height.
This fixes "readpixels" demo.
These two samplers use non-normalized texture coordinates. wrap_r
cannot be PIPE_TEX_WRAP_REPEAT (the default).
This fixes
sp_tex_sample.c:1790:get_linear_unorm_wrap: Assertion `0' failed
assertion failure.
Fix OpenVG "filter" demo
Program received signal SIGSEGV, Segmentation fault.
0xb7153dc9 in str_match_no_case (pcur=0xbfffe564, str=0x0) at
tgsi/tgsi_text.c:86
86 while (*str != '\0' && *str == uprcase( *cur )) {
This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.
No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.
Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
Hardware pretty commonly has saturate modifiers on instructions, and
this can be used in codegen to produce those, without everyone else
needing to understand clamping other than min and max.
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.
These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
Use fetch shader instead of having fetch instruction in the vertex
shader. Allow to restrict shader update to a smaller part when
vertex buffer input layout changes.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Occlusion query on evergreen need the event index field to be
set otherwise we endup locking up the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Condiation moves with a condition of (a < 0), (a > 0), (a <= 0), or (a
>= 0) can be generated with "a" directly as an operand of the CMP
instruction. This doesn't help much now, but it will help with
assembly shaders that use the CMP instruction.
This should prevent the field going unset in the future. See bug
http://bugs.freedesktop.org/show_bug.cgi?id=31544 for background.
Also remove unneeded calls to clear_teximage_fields().
Finally, call _mesa_set_fetch_functions() from the
_mesa_init_teximage_fields() function so callers have one less
thing to worry about.
Fix this GCC warning.
ir.cpp: In static member function
'static unsigned int ir_expression::get_num_operands(ir_expression_operation)':
ir.cpp:199: warning: control reaches end of non-void function
If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.
This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.
It's annoying to use test suites under a Mesa debug build because
pretty output is cluttered with stderr's continuous reports that
you're still using the debug driver.
The new usage message lists possible command line options. (Newcomers to Mesa
currently have to trawl through the source to find the command line options,
and we should save them from that trouble.)
Example Output
--------------
usage: ./glsl_compiler [options] <file.vert | file.geom | file.frag>
Possible options are:
--glsl-es
--dump-ast
--dump-hir
--dump-lir
--link
This adds sentinel values to the ir_expression_operation enum type:
ir_last_unop, ir_last_binop, and ir_last_opcode. They are set to the
previous one so they don't trigger "unhandled case in switch statement"
warnings, but should never be handled directly.
This allows us to remove the huge array of 1s and 2s in
ir_expression::get_num_operands().
We are not aware of any GPU that actually implements the cross product
as a single instruction. Hence, there's no need for it to be an opcode.
Future commits will remove it entirely.
This is really supposed to be defined only if the driver supports highp
in the fragment shader - but all of our current ES2 implementations do.
So, just define it. In the future, we'll need to add a flag to
gl_context and only define the macro if the flag is set.
"Fixes" freedesktop.org bug #31673.
ir_binop_less, ir_binop_greater, ir_binop_lequal, and ir_binop_gequal
are defined to work on vectors as well as scalars, as long as the two
operands have the same type.
This is evident from both ir_validate.cpp and our use of these opcodes
in the GLSL lessThan, greaterThan, lessThanEqual, greaterThanEqual
built-in functions.
Found by code inspection. Not known to fix any bugs. Presumably, our
tests for the built-in comparison functions must pass because C.E.
handling is done on the ir_call of "greaterThan" rather than the inlined
opcode. The C.E. handling of the built-in function calls is correct.
NOTE: This is a candidate for the 7.9 branch.
Default group bytes to 512 on evergreen. Don't query
tiling config yet for evergreen, the current info returned is not
adequate for evergreen (no way to get bank info).
RET was interpreted as END, which was wrong. Instead, if a shader contains RET
in the main function, it will fail to compile with an error message
from now on.
The hack is from early days.
When drawing GL_DEPTH_COMPONENT the usual fragment pipeline steps apply
so don't override the depth state.
When drawing GL_STENCIL_INDEX (or GL_DEPTH_STENCIL) the fragment pipeline
does not apply (only the stencil and Z writemasks apply) so disable writes
to the color buffers.
Fixes some regressions from commit ef8bb7ada9
This consolidates the TOKEN_OR_IDENTIFIER and RESERVED_WORD macros into
a single KEYWORD macro.
The old TOKEN_OR_IDENTIFIER macros handled the case of a word going from
an identifier to a keyword; the RESERVED_WORD macro handled a word going
from a reserved word to a language keyword. However, neither could
properly handle samplerBuffer (for example), which is an identifier in
1.10 and 1.20, a reserved word in 1.30, and a keyword in 1.40 and on.
Furthermore, the existing macros didn't properly handle reserved words
in GLSL ES 1.00. The best they could do was return a token (rather than
an identifier), resulting in an obtuse parser error, rather than a
user-friendly "you used a reserved word" error message.
Functions are not first class objects in GLSL, so there is never a value
of function type. No code actually used this except for one function
which asserted it shouldn't occur. One comment mentioned it, but was
incorrect. So we may as well remove it entirely.
This driver is a fake swdri driver that perform no operations
beside allocation gallium structure and buffer for upper layer
usage.
It's purpose is to help profiling core mesa/gallium without
having pipe driver overhead hidding hot spot of core code.
scons file are likely inadequate i am unfamiliar with this
build system.
To use it simply rename is to swrast_dri.so and properly set
LIBGL_DRIVERS_PATH env variable.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
src/mesa/drivers/dri/*/*/*.[chS] is a superset of
src/mesa/drivers/dri/*/server/*.[ch] and
src/mesa/drivers/dri/common/xmlpool/*.[ch].
include/GL/internal/glcore.h is already in MAIN_FILES, no need for it in
DRI_FILES too. src/glx/Makefile was listed twice.
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This showed up as cairo-gl gradients being inverted on everyone but
Intel, where I'd apparently tweaked the transformation to work around
the bug. Fixes piglit fbo-fragcoord.
Required because ATI and NVIDIA DX9 GPUs do not support indirect addressing
of temps, inputs, outputs, and consts (FS-only) or the hw support is so
limited that we cannot use it.
This should make r300g and possibly nvfx more feature complete.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Since this was talloced off of NULL instead of the compile state, it
was a real leak over the course of the program. Noticed with
valgrind --leak-check=full --show-reachable=yes. We should really
change these passes to generally get the compile context as an argument
so simple mistakes like this stop mattering.
This fixes a regression (failed assertion) from commit
c552f273f5 which was hit if glDeleteBuffers()
was called on a buffer that was never bound.
NOTE: this is a candidate for the 7.9 branch.
Currently r600_resource_copy_region() will turn these copies into
transfers + memcpys, so to avoid recursion we must not turn those
transfers back into blits.
Previously queries of MAX_SAMPLES were only allowed with
ARB_framebuffer_object, but EXT_framebuffer_multisample also enables
this query. This seems to only effect the i915. All other drivers
support both extensions or neither extension.
This patch is based on a patch that Kenneth sent along with the report.
NOTE: this is a candidate for the 7.9 branch.
Reported-by: Kenneth Waters <kwaters@chromium.org>
For driver performance analysis it usefull to be able to
disable as much as possible the GPU interaction so that
one can profile the userspace only.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
IF statements were getting flattened while they were broken. With
Zhenyu's last fix for ENDIF's type, everything appears to have lined
up to actually work.
This regresses two tests:
glsl1-! (not) operator (1, fail)
glsl1-! (not) operator (1, pass)
but fixes tests that couldn't work before because the IFs couldn't be
flattened:
glsl-fs-discard-01
occlusion-query-discard
(and, naturally, this should be a performance improvement for apps
that actually use IF statements to avoid executing a bunch of code).
Sometimes we swizzled in a different channel it looked like, and
sometimes we swizzled in zero. Or something.
Having looked at the output of another code generator for this chip,
this is approximately what they do, too: use align1 math on
temporaries, and then move the results into place.
Fixes:
glean/vp1-EX2 test
glean/vp1-EXP test
glean/vp1-LG2 test
glean/vp1-RCP test (reciprocal)
glean/vp1-RSQ test 1 (reciprocal square root)
shaders/glsl-cos
shaders/glsl-sin
shaders/glsl-vs-masked-cos
shaders/vpfp-generic/vp-exp-alias
Instead of messing with the callers simply copy our inputs into a
alloca array at the beginning of the function and then use it.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
DD_POINT_SIZE was broken for quite some time, and the only driver (r200) relying
on this no longer needs it.
Both DD_POINT_SIZE and DD_LINE_WIDTH have no users left outside of debugging
output, hence instead of fixing DD_POINT_SIZE setting just drop both of them -
there was a plan to remove tricaps flags entirely at some point.
DD_POINT_SIZE got never set for some time now (as it was set only in ifdefed
out code), which caused the r200 driver to use the point primitive mistakenly
in some cases which can only do size 1 instead of point sprite. Since the
logic to use point instead of point sprite prim is flaky at best anyway (can't
work correctly for per-vertex point size), just drop this and always emit point
sprites (except for AA points) - reasons why the driver tried to use points for
size 1.0 are unknown though it is possible they are faster or more conformant.
Note that we can't emit point sprites without point sprite cntl as that might
result in undefined point sizes, hence need drm version check (which was
unnecessary before as it should always have selected points). An
alternative would be to rely on the RE point size clamp controls which could
clamp the size to 1.0 min/max even if the SE point size is undefined, but currently
always use 0 for min clamp. (As a side note, this also means the driver does
not honor the gl spec which mandates points, but not point sprites, with zero size
to be rendered as size 1.)
This should fix recent reports of https://bugs.freedesktop.org/show_bug.cgi?id=702.
This is a candidate for the mesa 7.9 branch.
Fixes assertion failure with texture swizzling in the GLSL path when
it's triggered (such as gen6 FF or ARB_fp shadow comparisons).
Fixes:
texdepth
texSwizzle
fp1-DST test
fp1-LIT test 3
Also fixup code comment to reflect that the GPU requires DWORD
alignment, but in this case does not actually pass the value "in
DWORDs" as I previously stated.
This reverts commit 76360d6abc. On
second thought, it turned out that sync objects also used the
wait_rendering API like this, and would need the same treatment, and
so wait_rendering itself is fixed in libdrm now.
We were asking for a wait to GTT read (all GPU rendering to it
complete), instead of asking for all GPU reading from it to be
complete. Prevents swapbuffers-based apps from running away with
rendering, and produces a better input experience.
For some reason I though we needed the _DISCARD flag to avoid
readbacks, which isn't true at all. Now write operations should
pipeline properly, gives a good speedup to demos/tunnel.
When linked with certain builds of libstdc++, it appears like powf is resolved
by a symbol in that library. Other builds of libstdc++ doesn't contain that
symbol resulting in a linker / loader error. Optionally
resolve that symbol and replace it with calls to logf and expf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes this GCC warning.
graw.h:93: warning: 'struct pipe_surface' declared inside parameter list
graw.h:93: warning: its scope is only this definition or declaration,
which is probably not what you want
A call to radeon_prepare_render() at the beginning of draw
operations was placed too deep in the call chain,
inside r300RunRenderPrimitive(), instead of
r300DrawPrims() where it belongs. This leads to
emission of stale target color renderbuffer into the cs if
bufferswaps via page-flipping are used, and thereby causes
massive rendering corruption due to unsynchronized
rendering into the active frontbuffer.
This patch fixes such problems for use with the
upcoming radeon page-flipping patches.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
The width of the 2D blits used to copy the data is defined as a 16-bit
signed integer, but the pitch must be DWORD aligned. Limit to an integral
number of DWORDs, (1 << 15 - 4) rather than (1 << 15 -1).
Fixes corruption to data uploaded with glBufferSubData.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Allows applications to dump surfaces to file without
referencing gallium/auxiliary entry points statically.
Existing test apps have been modified such that
they save the contents of the fronbuffer only
when the `-o' option's specified.
These ES1 features were only tested for in the vertex array code.
Checking the ctx->API field at runtime is cleaner than the #ifdef
stuff and supports choosing the API at runtime.
Flushing the vertices after having already updated the state doesn't
do any good. Fixes useshaderprogram-flushverts-1. As a side effect,
by moving it to the right place we end up skipping no-op state changes
for traditional glUseProgram.
Rebuilding the vertex format from scratch every time we see a new
vertex attribute is rather costly, new attributes can be appended at
the end avoiding a copy to current and then back again, and the full
attr pointer recalculation.
In the not so likely case of an already existing attribute having its
size increased the old behavior is preserved, this could be optimized
more, not sure if it's worth it.
It's a modest improvement in FlightGear (that game punishes the VBO
module pretty hard in general, framerate goes from some 46 FPS to 50
FPS with the nouveau classic driver).
Signed-off-by: Brian Paul <brianp@vmware.com>
Silences this GCC warning.
brw_wm_fp.c: In function 'brw_wm_pass_fp':
brw_wm_fp.c:966: warning: 'last_inst' may be used uninitialized in this function
brw_wm_fp.c:966: note: 'last_inst' was declared here
Silences this GCC warning.
brw_wm_fp.c: In function 'precalc_tex':
brw_wm_fp.c:666: warning: 'tmpcoord.Index' may be used uninitialized in this function
Fixes this GCC warning with linux-x86 build.
radeon_dataflow.c: In function 'get_readers_normal_read_callback':
radeon_dataflow.c:472: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_schedule.c: In function 'merge_presub_sources':
radeon_pair_schedule.c:312: warning: ISO C90 forbids mixed declarations and code
Commit 8dfafbf086 forgot to update r300g.
There is a buf == NULL check, but buf is used before for var init.
Tested-by: Guillermo S. Romero <gsromero@infernal-iceberg.com>
Fixes this GCC warning.
nouveau_vbo_t.c: In function 'nv10_vbo_render_prims':
nouveau_render_t.c:161: warning: 'max_out' may be used uninitialized in this function
nouveau_render_t.c:161: note: 'max_out' was declared here
We really only want to print spaces -between- elements, not after each
element. This cleans up error messages from IR reader, making them
(mildly) easier to read.
In particular, calling the abs function is silly, since there's already
an expression opcode for that. Also, assigning to temporaries then
assigning those to the final location is rather redundant.
Trivial change that avoids a segmentation fault when the blitter state
happens to be bound when the context is destroyed.
The free calls should probably removed altogether in the future -- the
responsibility to destroy the state atoms lies with whoever created it,
and the safest thing for the pipe driver is to not touch any bound state
in its destructor.
In testing on Ironlake, the histogram of clocks/pixel results for the
system memcpy and magic unaligned memcpy show no noticeable difference
(and no statistically significant difference with the 5510 samples
taken, though the stddev is large due to what looks like the cache
effects from the different texture sizes used).
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
Discard fractional bits from linewidth. This matches the nvidia
closed drivers, my reading of the OpenGL SI and current llvmpipe
behaviour.
It looks a lot nicer & avoids ugliness where lines alternate between n
and n+1 pixels in width along their length.
Also fix up r600g to match.
These were previously being left in the default (D3D) mode. This mean
that triangles were drawn slightly incorrectly, but also because this
state is relied on by the u_blitter code, all blits were half a pixel
off.
Generalize the existing tiled_buffer path in texture transfers for use
in some non-tiled up and downloads.
Use a staging buffer, which the winsys will restrict to GTT memory.
GTT buffers have the major advantage when they are mapped, they are
cachable, which is a very nice property for downloads, usually the CPU
will want to do look at the data it downloaded.
This opens the question of what interface the winsys layer should
really have for talking about these concepts.
For now I'm using the existing gallium resource usage concept, but
there is no reason not use terms closer to what the hardware
understands - eg. the domains themselves.
The callback presents the given attachment to the native engine. It
allows the swap behavior and interval to be controlled. It will replace
native_surface::flush_frontbuffer and native_surface::swap_buffers
shortly.
Silences warning such as:
main/texobj.c:442:40: warning: ISO C99 requires rest arguments to be used
main/texobj.c:498:58: warning: ISO C99 requires rest arguments to be used
This ensures that we increase bo->map_count when radeon_bo_map_internal()
returns successfully, which in turn makes sure we don't decrement
bo->map_count below zero later.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
The call to draw_bind_fragment_shader() was using the old fragment
shader. This bug would have really only effected the draw module's
use of the fragment shader in the wide point stage.
Some C++ header files were included in an extern "C" block. When building with
Clang, this caused the build to fail due to namespace errors. (GCC did not
report any errors.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Important as more constant buffers per shader start to get used.
Fix up r600 (tested) and nv50 (untested) to cope with this. Drivers
previously didn't see unbinds of constant buffers often or ever, so
this isn't always dealt with cleanly.
For r600 just return and keep the reference. Will try to do better in
a followup change.
Don't trim triangle bounding box to scissor/draw-region until after
the logic for emitting tri_16. Don't generate tri_16 commands for
triangles with untrimmed bounding boxes outside the current tile.
This is important as the tri-16 itself can extend past tile bounds and
we don't want to add code to it to check against tile bounds (slow) or
restrict it to locations within a tile (pessimistic).
This effectively redoes 1741ddb747 in a
way that allows contexts of different APIs to coexist.
First, the changes to the remap table are reverted. The remap table
(driDispatchRemapTable) is always initialized in the same way regardless
of the context API.
es_generator.py is updated to use a local remap table, whose sole
purpose is to help initialize its dispatch table. The local remap table
and the global one are always different, as they use different
glapidispatch.h. But the dispatch tables initialized by both remap
tables are always compatible with glapi (libGL.so).
Finally, the semantics of one_time_init are changed to per-api one-time
initialization.
Core mesa should query glapi for the positions of the functions in
_glapi_table when multiple APIs are supported. It does not know which
glapitable.h glapi used.
Use scons target and dependency system instead of ad-hoc options.
Now is simply a matter of naming what to build. For example:
scons libgl-xlib
scons libgl-gdi
scons graw-progs
scons llvmpipe
and so on. And there is also the possibility of scepcified subdirs, e.g.
scons src/gallium/drivers
If nothing is specified then everything will be build.
There might be some rough corners over the next days. Please bare with me.
Without this, update_textures may not pick up the new pipe_resource.
It is actually update_textures that should check
stObj->sampler_view->texture != stObj->pt, but let's follow st_TexImage
and others for now.
Make autoconf decide the client APIs enabled first. Then when OpenGL
and OpenGL ES are disabled, there is no need to build src/mesa/; when
OpenGL is disabled, no $mesa_driver should be built. Finally, add
--enable-openvg to enable OpenVG.
With these changes, an OpenVG only build can be configured with
$ ./configure --disable-opengl --enable-openvg
src/mesa, src/glsl, and src/glx will be skipped, which saves a great
deal of compilation time.
And an OpenGL ES only build can be configured with
$ ./configure --disable-opengl --enable-gles-overlay
Fixes this GCC warning.
state_tracker/st_program.c: In function 'st_print_shaders':
state_tracker/st_program.c:735: warning: 'sh' may be used uninitialized in this function
This fixes some insanity that would otherwise be required for GLSL
1.30 bit ops or gen6 integer uniform operations in general, at the
cost of upload-time pain. Given that we only have that pain because
mesa's mangling our integer uniforms to be floats, this something that
should be fixed outside of the shader codegen.
First, it changes autoconf to use a "python2" binary when available,
rather than plain "python" (which is ambiguous). Secondly, it changes
the Makefiles to use $(PYTHON) $(PYTHON_FLAGS) rather than calling
python directly.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Matthew William Cox <matt@mattcox.ca>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:593: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:593: note: 'r' was declared here
radeon_bo_destroy() will want to read the list field. Without this patch,
we'd end up evaluating the list pointers before they have been properly
set up when we destroyed the newly created bo if it cannot be mapped.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
With 07b85457d9, glapitable.h is included
by core mesa only to know the size of _glapi_table. It is not necessary
as the same info is given by _gloffset_COUNT.
This change makes _glapi_table opaque to core mesa. All operations on
it are supposed to go through one of the SET/GET/CALL macros.
Move defines in glapioffsets.h to glapidispatch.h. Rename
_gloffset_FIRST_DYNAMIC to _gloffset_COUNT, which is equal to the number
of entries in _glapi_table.
Consistently use SET_by_offset, GET_by_offset, CALL_by_offset, and
_gloffset_* to recursively define all SET/GET/CALL macros.
glapioffsets.h exists for the same reason as glapidispatch.h does. It
is of no use to glapi. This commit also drops the use of glapioffsets.h
in glx as glx is considered an extension to glapi when it comes to
defining public GL entries.
glapidispatch.h exists so that core mesa (libmesa.a) can be built for
DRI drivers or for non-DRI drivers as a compile time decision (whether
IN_DRI_DRIVER is defined). It is of no use to glapi. This commit also
drops the use of glapidispatch.h in glx and libgl-xlib as they are
considered extensions to glapi when it comes to defining public GL
entries.
This lets us simplify and consolidate some state checking code.
This implements the GL_INVALID_OPERATION check for all drawing commands
required by GL_EXT_texture_integer.
It's a little more painful than before because we don't have the handy
mask register any more, and have to make do with cooking up a value
out of the flag register.
This is apparently required, as the thread will be initiated while it
still has dependencies, and this is what waits for those to be
resolved before writing color.
Function ast_declarator_list::hir(), when processing keywords added by
extension ARB_fragment_coord_conventions, made the mistake of checking only if
the extension was __supported by the driver__. The correct behavior is to check
if the extensi0n is __enabled in the parse state__.
NOTE: this is a candidate for the 7.9 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ensure -L$(TOP)/$(LIB_DIR) (the staging dir for build products), appears
in the link line before any -L in $LDFLAGS, so that we link driver we are
building with libEGL we have just built, and not an installed version
[olv: make a similar change to targets/egl]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This call sequence
eglMakeCurrent(dpy, surf, surf, ctx1);
eglMakeCurrent(dpy, surf, surf, ctx2);
should be valid if ctx1 and ctx2 have the same client API and are not
current in another thread.
Internally a mode belongs to a screen. But functions like
eglGetModeAttribMESA treat a mode as a display resource: a mode can be
looked up without a screen. Considering how KMS works, it is better to
stick to the current implementation.
To properly support looking up a mode without a screen, this commit
assigns each mode (of all screens) a unique ID.
The opaque nature of EGLImage implies that extensions almost always
define their own attributes. Move attributes in _EGLImage to
_EGLImageAttribs and add a helper function to parse attribute lists.
Fixes glsl-fs-convolution-2, which was blowing up due to the array
access insanity getting at the uniform values within the loop. Each
temporary was considered live across the whole loop.
One, it was allocating increments of 1kb, but per thread scratch space
is a power of two. Two, the new FS wasn't getting total_scratch set
at all, so everyone thought they had 1kb and writes beyond 1kb would
go stomping on a neighbor thread.
With this plus the previous register spilling for the new FS,
glsl-fs-convolution-1 passes.
g0 is used by others, and is expected to be left exactly as it was
dispatched to us. So manually move g0 into our message reg when
spilling/unspilling and update the offset in the MRF. Fixes failures
in texture sampling after having spilled a register.
It's amazing this code worked. Basically, we would get lucky in
register allocation and the tests using frontfacing would happen to
allocate gl_FrontFacing storage and the instructions generating
gl_FrontFacing but pointing at another register to the same hardware
register. Noticed during register spilling debug, when suddenly they
didn't get allocatd the same storage.
Since this is just generated by python, it's questionable whether this
should continue to live in the repository - Mesa already has other
things generated from python as part of the build process.
This works around MSVC's 65535 byte limit, unfortunately at the expense
of any semblance of readability and much larger file size. Hopefully I
can implement a better solution later, but for now this fixes the build.
Fixes this GCC warning.
gallivm/lp_bld_tgsi_aos.c: In function 'lp_build_tgsi_aos':
gallivm/lp_bld_tgsi_aos.c:516: warning: 'dst0' may be used uninitialized in this function
gallivm/lp_bld_tgsi_aos.c:516: note: 'dst0' was declared here
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_nearest':
gallivm/lp_bld_sample_aos.c:271: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:271: warning: 'r_ipart' may be used uninitialized in this function
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_linear':
gallivm/lp_bld_sample_aos.c:439: warning: 'r_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_lo' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_lo' may be used uninitialized in this function
At the moment you need kernel patches to have texture tiling work
with the kernel CS checker, so once they are upstream and the drm version
is bumped we can make this enable flip the other way most likely.
Since the hw transitions from 2D->1D sampling below the 2D macrotile
size we need to keep track of the array mode per level so we can
render to it using the CB.
Silences this GCC warning.
ast_to_hir.cpp: In function 'void apply_type_qualifier_to_variable(const
ast_type_qualifier*, ir_variable*, _mesa_glsl_parse_state*, YYLTYPE*)'
ast_to_hir.cpp:1768: warning: enumeration value 'ir_shader' not handled
in switch
We were working around an LLVM 2.5 bug but we're using LLVM 2.6 or later now.
This basically reverts commit baddcbc522.
This fixes the piglit bug/tri-tex-crash.c failure.
Silences these GCC warnings.
r600_shader.c: In function 'tgsi_exp':
r600_shader.c:2339: warning: 'r600_src[0].rel' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].abs' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].neg' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].chan' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].sel' is used uninitialized in this function
Previously some shader input or outputs that hadn't received location
assignments could slip through. This could happen when a shader
contained user-defined varyings and was used with either
fixed-function or assembly shaders.
See the piglit tests glsl-[fv]s-user-varying-ff and
sso-user-varying-0[12].
NOTE: this is a candidate for the 7.9 branch.
This function type checks the operands of and returns the result type of
bit-logic operations. It replaces the type checking performed in the
following cases of ast_expression::hir() :
- ast_bit_and
- ast_bit_or
- ast_bit_xor
This function type checks the operands of and returns the result type of
bit-shift operations. It replaces the type checking performed in the following
cases of ast_expression::hir() :
- ast_lshift
- ast_rshift
This fixes hangs in some Z-writes-in-shaders tests, though other
pieces don't come out correctly.
Bug #30392: hang in fbo-fblit-d24s8. (still fails with bad color drawn
to some targets)
rc_get_readers_normal() supplies a list of readers for a given
instruction. This function is now being used by the copy propagate
optimization and will eventually be used by most other optimization
passes as well.
Existing code relies on IR being generated (possibly with error type)
rather than returning NULL. So, don't break - go ahead and generate the
operation. As long as an error is flagged, things will work out.
Fixes fd.o bug #30914.
Thanks to Alex Deucher for pointing out the FLT to int conversion is necessary
and writing an initial patch, this brings about 20 piglits, and I think this
is the last piece to make evergreen and r600 equal in terms of features.
We need to keep track of three different fragment shaders: Z-only, stencil-
only, and Z+stencil. Before, we were only keeping track of the first one
we encountered.
We often use reg_null as the destination when setting up the flag
regs. However, on gen6 there aren't general implicit conversions to
destination types from src types, so the comparison to produce the
flag regs would be done on the integer result interpreted as a float.
Hilarity ensued.
Fixes 20 piglit cases.
In ir_validate::visit_leave(), the cases for
- ir_binop_bit_and
- ir_binop_bit_xor
- ir_binop_bit_or
were incorrect. It was incorrectly asserted that both operands must be the
same type, when in fact one may be scalar and the other a vector. It was also
incorrectly asserted that the resultant type was the type of the left operand,
which in fact does not hold when the left operand is a scalar and the right
operand is a vector.
Implement by adding the following cases to ast_expression::hir():
- ast_lshift
- ast_rshift
Also, implement ir validation for the new operators by adding the following
cases to ir_validate::visit_leave():
- ir_binop_lshift
- ir_binop_rshift
On evergreen, interpolation has moved into the fragment shader,
with the interpolation parmaters being passed via GPRs and LDS entries.
This works out the number of interps required and reserves GPR/LDS
storage for them, it also correctly routes face/position values which
aren't interpolated from the vertex shader.
Also if we noticed nothing is to be interpolated we always setup perspective
interpolation for one value otherwise the GPU appears to lockup.
This fixes about 15 piglit tests on evergreen.
Previously _LinkedShaders was a compact array of the linked shaders
for each shader stage. Now it is arranged such that each slot,
indexed by the MESA_SHADER_* defines, refers to a specific shader
stage. As a result, some slots will be NULL. This makes things a
little more complex in the linker, but it simplifies things in other
places.
As a side effect _NumLinkedShaders is removed.
NOTE: This may be a candidate for the 7.9 branch. If there are other
patches that get backported to 7.9 that use _LinkedShader, this patch
should be cherry picked also.
This implements round() via the ir_unop_round_even opcode, rather than
adding a new opcode. We may wish to add one in the future, since it
might enable a small performance increase on some hardware, but for now,
this should suffice.
Tested with demos/pixeltest - line rasterization doesn't seem to be
set up for GL conventions yet, but at least width is respected now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is now to the point where we have no regressing piglit tests. It
also fixes Yo Frankie! and Humus DynamicBranching, probably due to the
piglit bias tests that work that didn't on the Mesa IR backend.
As a downside, performance takes about a 5-10% performance hit at the
moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by
reintroducing 16-wide fragment shaders where possible. It is a win,
though, for fragment shaders using flow control.
Simply using RNDU, RNDZ, or RNDE does not produce the desired result.
Rather, the RND* instructions place a value in the destination register
that may be 1 less than the correct answer. They can also set per-channel
"increment bits" in a flag register, which, if set, mean dest needs to
be incremented by 1. A second instruction - a predicated add -
completes the job.
Notably, RNDD always produces the correct answer in a single
instruction.
Fixes piglit test glsl-fs-trunc.
This cuts usually 2 out of 3 instructions for flag reg generation (if
statements, conditional assignment) by producing the conditional mod
in the expression representing the boolean value.
Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register
allocation no longer fails for the conditional generation
proliferation)
This will be a place to peephole comparisions directly to the flag
regs, and for now avoids using MOV with conditional mod on gen6, which
is now illegal.
GLES1 and GLES2 install their own exec pointers and don't need the
Save table. Also, the SET_* macros use different indices for the different
APIs so the offsets used in vtxfmt.c are actually wrong for the ES APIs.
Just always check for FLUSH_UPDATE_CURRENT and call Driver.BeginVertices
when necessary. By using the unlikely() macros, this ends up as
a 10% performance improvement (for isosurf, anyway) over the old,
complicated function pointer swapping.
Don't use r0 for FF_SYNC dest reg on Sandybridge, which would
smash FFID field in GS payload, that cause later URB write fail.
Also not use r0 in any URB write requiring allocate.
Actually validate that the implementation supports the particular
shader target as well. Previously if a driver only supported vertex
shaders, for example, glCreateShaderObjectARB would gladly create a
fragment shader.
NOTE: this is a candidate for the 7.9 branch.
This really amounts to just using the return value from
link_function_calls. All the work was being done, but the result was
being ignored.
Fixes piglit test link-unresolved-funciton.
NOTE: this is a candidate for the 7.9 branch.
Together with the previous commit, this generalize the benefits of
d2cf757f44 to all depth formats, in
particular:
- simpler float -> 24unorm conversion
- avoid unsigned comparisons (not directly supported on SSE) by aligning
to the least significant bit
- avoid unecessary/repeated mask ANDing
Verified with trivial/tri-z that the exact same assembly is produced for
X8Z24.
Z32_FLOAT uses <4 x float> as intermediate/destination type,
instead of <4 x i32>.
The necessary bitcasts got removed with commit
5b7eb868fd
Also use depth/stencil type and build contexts consistently, and
make the depth pointer argument a ordinary <i8 *>, to catch this
sort of issues in the future (and also to pave way for Z16 and
Z32_FLOAT_S8_X24 support).
__GLcontextModes is always only used as an implementation internal struct
at this point and we shouldn't install glcore.h anymore. Anything that
needs __GLcontextModes should just include the struct in its headers files
directly.
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
Fixes this GCC warning.
r300_state.c: In function 'r300InvalidateState':
r300_state.c:2247: warning: 'hw_format' may be used uninitialized in this function
r300_state.c:2247: note: 'hw_format' was declared here
This adds proper support for the GL_ARB_shader_stencil_export extension
to the GLSL compiler. Thanks to Ian for pointing out where I need to add things.
If the pipe driver has shader stencil export we can accelerate DrawPixels
using it. It tries to pick an S8 texture and works its way to X24S8 and S8X24
if that isn't supported.
We need a texture to put the drawpixels stuff into, an S8 texture is less
memory/bandwidth than the 32-bit X24S8, but we might not be able to render
directly to an S8, so this lets us specify we won't be rendering to this
texture.
this improves mesa texstore for 8/24 so it can create S24X8/X24S8 variants
by keeping the depth bits static.
it also adds a texstore for S8 so we can write out an S8 texture to use
in the sampler for accel draw pixels to save memory bw.
The logic seems sound here, I've worked it out a few times on paper, though
it would be good to have some review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There was a check to only do the rebase if we didn't have everything
in VBOs, but nexuiz apparently hands us a mix of VBOs and arrays,
resulting in blocking on the GPU to do a rebase.
Improves nexuiz 800x600, high-settings performance on my Ironlake 41%
(+/- 1.3%), from 14.0fps to 19.7fps.
The format selection of the CopyTexSubImage is pretty bogus still, but
this at least avoids software fallbacks in nexuiz, bringing
performance from 7.5fps to 12.8fps on my machine.
This assertion was added in commit f1c1ee11, but it did not notice
that the array is accessed with 'size-1' instead of 'size'. As a
result, the assertion was off by one. This caused failures in at
least glsl-orangebook-ch06-bump.
If an GLSL shader is used that does not provide all stages and
assembly shaders are provided for the missing stages, validate the
assembly shaders.
Fixes bugzilla #30787 and piglit tests glsl-invalid-asm0[12].
NOTE: this is a candidate for the 7.9 branch.
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).
This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.
Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.
Fixes 7 piglit math tests.
Add ability to set the GLSL version used by the GLcontext by setting the
environment variable INTEL_GLSL_VERSION. For example,
env INTEL_GLSL_VERSION=130 prog args
If the environment variable is missing, the GLSL versions defaults to 120.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By calling radeon_draw_buffers (which sets the necessary flags
in radeon->NewGLState) and revalidating if NewGLState is non-zero
in r200TclPrimitive. This fixes an assert in libdrm (the color-/
depthbuffer was changed but not yet validated) and and stops the
kernel cs checker from complaining about them (when they're too
small).
Thanks to Mario Kleiner for the hint to call radeon_draw_buffer
(instead of my half-broken hack).
v2: Also fix the swtcl r200 path.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
Fixes these GCC warnings.
brw_wm_fp.c: In function 'search_or_add_const4f':
brw_wm_fp.c:92: warning: 'reg.Index2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.Index2' was declared here
brw_wm_fp.c:92: warning: 'reg.RelAddr2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.RelAddr2' was declared here
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
The constness of the function parameter gets inlined with the rest of
the function. However, there is also an assignment to the parameter.
If this occurs inside a loop the loop analysis code will get confused
by the assignment to a read-only variable.
Fixes bugzilla #30552.
NOTE: this is a candidate for the 7.9 branch.
Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
We were trying to remap a fully-filled array down to only handing the
WM the components it uses. This is called attribute swizzling, and if
you don't enable it you just get 1:1 mappings of inputs to outputs.
This almost fixes glsl-routing, except for the highest gl_TexCoord[]
indices.
We would compute a new buffer, but never point the hardware at the new
buffer. This partially fixes glsl-routing, as now it get the updated
uniform for which attribute to draw.
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
interp data is stored in gpr0 so first interp overwrote it
and subsequent ones got wrong values
reserve register 0 so it's not used for attribs.
alternative is to interpolate attrib0 last (reverse, as r600c does)
We sensibly only provide it if the FS asks for it. We could actually
skip WPOS unless the FS needed WPOS.zw, but that's something for
later.
Fixes: glsl-texture2d and probably many others.
- Handle PIPE_FORMAT_Z32_FLOAT packing correctly.
- In the integer version z shouldn't be passed as as double.
- Make it clear that the integer versions should only be used for masks.
- Make integer type sizes explicit (uint32_t for now, although
uint64_t will be necessary later to encode f32_s8_x24).
The jump delta is now in the part of the instruction where the
destination fields used to be, and the src args are ignored (or not,
for the new non-predicated IF that we don't use yet).
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
Got a speed up by tracking the dirty blocks in a seperate list instead of looping through all blocks. This version should work with block that get their dirty state disabled again and I added a dirty check during the flush as some blocks were already dirty.
Flush read cache before writting register. Track flushing inside
of a same cs and avoid reflushing same bo if not necessary. Allmost
properly force flush if bo rendered too and then use as a texture
in same cs (missing pipeline flush dunno if it's needed or not).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When we go to do a lot of bos in one draw like constant bufs we need
to avoid bouncing off the busy ioctl, this mitigates by backing off
on busy bos for a short amount of times.
These texture formats (like R16G16B16A16_UNORM) were untested until now
because st/mesa doesn't use them. I am testing this with a hacked st/mesa
here.
I am trying to be exhaustive, but still I might have missed tons of other
changes to Gallium.
(cherry picked from commit 968a9ec76e)
Conflicts:
docs/relnotes-7.9.html
The glsl core should be handling most dead code issues for us, but we
generate some things in codegen that may not get used, like the 1/w
value or pixel deltas. It seems a lot easier this way than trying to
work out up front whether we're going to use those values or not.
this code was memcmp'ing two structs, but refcounting one of them afterwards,
so any subsequent memcmp was never going to work.
again this stops unnecessary uploads of vertex program,
The brw_wm_surface_state.c handling of GL_DEPTH_TEXTURE_MODE doesn't
apply to shadow compares, which always return an intensity value. The
texture swizzles can do the job for us.
Fixes:
glsl1-shadow2D(): 1
glsl1-shadow2D(): 3
The hw swizzles have been obtained by a brute force approach,
and only C0 and C2 are stored in UV88, the other channels are
ignored.
R16G16 is going to be a lot trickier.
Luckily, one of them would result in failing out register allocation
when the other bugs were encountered. Applies to
glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined, which still
fails register allocation, but now legitimately.
Fixes several problems:
The half-float, float, and integer internal formats depend on
ARB_texture_rg and other extensions.
RG_INTEGER is not a valid internal format.
Generic compressed formats depend on ARB_texture_rg, not
EXT_texture_compression_rgtc.
Use GL_RED instead of GL_R.
We should fix the SF to actually give us just the data we need, but
this fixes regressions in the new FS until then.
Fixes:
glsl-kwin-blur
glsl-routing
Trying to track the insanity of the different argument layouts for
normal/shadow crossed with normal/lod/bias one generation at a time is
enough.
Fixes: glsl1-texture2D() with bias.
(first test passing in this code that doesn't pass without it!)
We should end up with the same code, but anyone else with this issue
could share the handling (which I got wrong for shadow comparisons in
the driver before).
I've screwed this up enough times that I don't think it's worth it.
This time, it was that I was doing it once per top-level body
instruction instead of just once at the end of the loop body.
The interference is totally bogus (maximal), so this is equivalent to
our trivial register assignment before. As in, passes the same set of
piglit tests.
Now, virtual GRFs are consecutive integers, rather than offsetting the
next one by the size. We need the size information to still be around
for real register allocation, anyway.
Fixes this GCC warning on linux-x86 build.
r3xx_vertprog.c: In function ‘ei_if’:
r3xx_vertprog.c:396: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog_emit.c: In function ‘emit_paired’:
r500_fragprog_emit.c:237: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_tex’:
r500_fragprog_emit.c:367: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_flowcontrol’:
r500_fragprog_emit.c:415: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘r500BuildFragmentProgramHwCode’:
r500_fragprog_emit.c:633: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog.c: In function ‘r500_transform_IF’:
r500_fragprog.c:45: warning: ISO C90 forbids mixed declarations and code
r500_fragprog.c: In function ‘r500FragmentProgramDump’:
r500_fragprog.c:256: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_emit.c: In function ‘emit_alu’:
r300_fragprog_emit.c:143: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c:156: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘finish_node’:
r300_fragprog_emit.c:271: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘emit_tex’:
r300_fragprog_emit.c:344: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_swizzle.c: In function ‘r300_swizzle_is_native’:
r300_fragprog_swizzle.c:120: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_swizzle.c: In function ‘r300_swizzle_split’:
r300_fragprog_swizzle.c:159: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_rename_regs.c: In function ‘rc_rename_regs’:
radeon_rename_regs.c:112: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_remove_constants.c: In function ‘rc_remove_unused_constants’:
radeon_remove_constants.c💯 warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warning on linux-x86 build.
radeon_optimize.c: In function ‘constant_folding’:
radeon_optimize.c:419: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:425: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:432: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_dataflow_deadcode.c: In function ‘push_branch’:
radeon_dataflow_deadcode.c:112: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘update_instruction’:
radeon_dataflow_deadcode.c:183: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘rc_dataflow_deadcode’:
radeon_dataflow_deadcode.c:352: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c:379: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_pair_regalloc.c: In function ‘rc_pair_regalloc_inputs_only’:
radeon_pair_regalloc.c:330: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_schedule.c: In function ‘emit_all_tex’:
radeon_pair_schedule.c:244: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘destructive_merge_instructions’:
radeon_pair_schedule.c:291: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c:438: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_read’:
radeon_pair_schedule.c:619: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_write’:
radeon_pair_schedule.c:645: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘schedule_block’:
radeon_pair_schedule.c:673: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘rc_pair_schedule’:
radeon_pair_schedule.c:730: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_translate.c: In function ‘set_pair_instruction’:
radeon_pair_translate.c:153: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:170: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c: In function ‘rc_pair_translate’:
radeon_pair_translate.c:336: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:341: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_program_alu.c: In function ‘r300_transform_trig_simple’:
radeon_program_alu.c:882: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c:932: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘radeonTransformTrigScale’:
radeon_program_alu.c:996: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘r300_transform_trig_scale_vertex’:
radeon_program_alu.c:1033: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_emulate_loops.c: In function ‘rc_emulate_loops’:
radeon_emulate_loops.c:517: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings with linux-x86 build.
radeon_emulate_branches.c: In function ‘handle_if’:
radeon_emulate_branches.c:65: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:71: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_else’:
radeon_emulate_branches.c:94: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_endif’:
radeon_emulate_branches.c:201: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘fix_output_writes’:
radeon_emulate_branches.c:267: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:284: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘rc_emulate_branches’:
radeon_emulate_branches.c:307: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning.
math/m_debug_xform.c: In function '_math_test_all_transform_functions':
math/m_debug_xform.c:320: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_norm.c: In function '_math_test_all_normal_transform_functions':
math/m_debug_norm.c:365: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_clip.c: In function '_math_test_all_cliptest_functions':
math/m_debug_clip.c:363: warning: format not a string literal and no format arguments
Instead of creating group of register use a hash table
to lookup into which block each register belongs. This
simplify code a bit.
Signed-off-by: Jerome Glisse <jglisse@redhat.com
Fixes glean pbo crash.
It would be possible to avoid crashing without decoupling, but given
that state trackers give no guarantee that number of views is consistent,
that would likely cause too many state updates (or miss some).
Corrections in store_clip to store clip coordinates in AoS form.
Viewport & cliptest flag options based on variant key.
Put back draw_pt_post_vs and now 2 paths based on whether clipping
occurs or not.
Cliptesting now done at the end of vs in draw_llvm instead of
draw_pt_post_vs.
Added viewport mapping transformation and further cliptesting to
vertex shader in draw_llvm.c
Alternative path where vertex header setup, clip coordinates store,
cliptesting and viewport mapping are done earlier in the vertex
shader.
Still need to hook this up properly according to the return value of
"draw_llvm_shader" function.
I'm still not pleased with how builtin uniforms are handled, but as
long as we're relying on the prog_statevar stuff this seems about as
good as it'll get.
Otherwise, we'll often end up with gl_TexCoord being 0 length, for
example. With ir_to_mesa, things ended up working out anyway, as long
as multiple implicitly-sized arrays weren't involved.
Fixes these tests using gl_FragData or just gl_FragDepth:
glsl1-Preprocessor test (extension test 1)
glsl1-Preprocessor test (extension test 2)
glsl-bug-22603
We don't set the type on the array virtual reg as a whole, so here's
the right place.
Fixes:
glsl1-GLSL 1.20 arrays
glsl1-temp array with constant indexing, fragment shader
glsl1-temp array with swizzled variable indexing
We need to re-run channel expressions afterwards as it generates new
vector expressions, and we need to successfully support conditional
assignment (brw_CMP takes 2 operands, not 1).
While much of this we will want to support natively, this should make
the task of reaching the Mesa IR backend's quality easier.
Fixes:
glsl-fs-main-return.
New design seems to be on parity according to piglit,
make it default to get more exposure and see if there
is any show stopper in the coming days.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Sandybridge's PS constant buffer payload size is decided from
push const buffer command, incorrect size would cause wrong data
in payload for position and vertex attributes. This fixes coefficients
for tex2d/tex3d.
The driver actually creates a 3D texture aligned to POT and does all
the magic with texture coordinates in the fragment shader. It first
emulates REPEAT and MIRRORED wrap modes in the fragment shader to get
the coordinates into the range [0, 1]. (already done for 2D NPOT)
Then it scales them to get the coordinates of the NPOT subtexture.
NPOT textures are now less of a lie and we can at least display
something meaningful even for the 3D ones.
Supported wrap modes:
- REPEAT
- MIRRORED_REPEAT
- CLAMP_TO_EDGE (NEAREST filtering only)
- MIRROR_CLAMP_TO_EDGE (NEAREST filtering only)
- The behavior of other CLAMP modes is undefined on borders, but they usually
give results very close to CLAMP_TO_EDGE with mirroring working perfectly.
This fixes:
- piglit/fbo-3d
- piglit/tex3d-npot
Taking the W component from coords directly ignores swizzling. Instead,
take the component which is mapped to W in the TEX instruction parameter.
The same for Z.
NOTE: This is a candidate for the 7.9 branch.
Some random stuff I had here.
1) Fixed some misleading comments.
2) Removed fake_npot, since it's redundant.
3) lower_texture_rect -> scale_texcoords
4) Reordered and reindented some TEX transform code.
It's trying to get an int smeared across all channels, not trying to
get a 1:1 mapping of a subset of a vector's channels. This usually
ended up not mattering with ir_to_mesa, since it just smears floats
into every chan of a vec4.
Fixes:
glsl1-temp array with swizzled variable indexing
The pipe_sampler_view's swizzle terms also apply to the texture border
color. Simply move the apply_sampler_swizzle() call after we fetch
the border color.
Fixes many piglit texwrap failures.
Changes in v2:
- Invalidate last_tile_addr on any change, fixing regressions
- Correct coding style
Currently softpipe ends up allocating more than 200 MB of memory
for each context due to the tile caches.
Even worse, this memory is all explicitly cleared, which means that the
kernel must actually back it with physical RAM right away.
This change allocates tile memory on demand.
Signed-off-by: Brian Paul <brianp@vmware.com>
Build packet header once and allow to add fake register support so
we can handle things like indexed set of register (evergreen sampler
border registers for instance.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 80ee3a440c added a PROGS_DEPS
definition, but no uses, even though it seems clearly intended
to be a set of additional dependencies for $(PROGS).
Correct this.
Currently makedepend is used by the Mesa Makefile-based build system,
but not required.
Unfortunately, not having it makes dependency resolution non-existent,
which is a source of subtle bugs, and is a rarely tested
configuration, since all Mesa developers likely have it installed.
Furthermore some idioms require dependency resolution to work at all,
such as making headers depend on generated files.
Fixes these GCC warnings.
r600_shader.c: In function 'tgsi_tex':
r600_shader.c:1611: warning: 'src2_chan' may be used uninitialized in this function
r600_shader.c:1611: warning: 'src_chan' may be used uninitialized in this function
When occlusion query are running we want to have accurate
fragment count thus disable any early culling optimization
GPU has.
Based on work from Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes this GCC warning.
radeon_state.c: In function 'radeon_state_fini':
radeon_state.c:140: warning: 'return' with a value, in function returning void
Move GB_ENABLE to derived rs state, and find sprite coord for the correct
generic and enable the tex coord for that generic.
Signed-off-by: Dave Airlie <airlied@redhat.com>
1. We can't turn an instruction into a presubtract operation if it
writes to one of the registers it reads from.
2. If we turn an instruction into a presubtract operation, we can't
remove that intruction unless all readers can use the presubtract
operation.
This fixes fdo bug 30337.
This is a candidate for the 7.9 branch.
The variables are used only in currently disabled code.
Fixes this GCC warning.
r600_context.c: In function 'r600_flush':
r600_context.c:76: warning: unused variable 'dname'
r600_context.c:75: warning: unused variable 'dc'
this is more a proof to show vector shifts on x86 with per-element shift count
are evil. Since we can avoid the shift with a single compare/select, use that
instead. Replaces more than 20 instructions (and slow ones at that) with about 3,
and cuts compiled shader size with mesa's yuvsqure demo by over 10%
(no performance measurements done - but selection is blazing fast).
Might want to revisit that for future cpus - unfortunately AVX won't have vector
shifts neither, but AMD's XOP will, but even in that case using selection here
is probably not slower.
While it's true that llvm can and will indeed replace this with bit
arithmetic (since block height/width is POT), it does so (llvm 2.7) by element
and hence extracts/shifts/reinserts each element individually.
This costs about 16 instructions (and extract is not really fast) vs. 1...
Fixes this GCC warning.
r600_hw_states.c: In function 'r600_translate_fill':
r600_state_inlines.h:136: warning: control reaches end of non-void function
The variables are only used in currently disabled code.
Fixes this GCC warning.
r600_state2.c: In function 'r600_flush2':
r600_state2.c:613: warning: unused variable 'dname'
r600_state2.c:612: warning: unused variable 'dc'
Fix this GCC warning.
intel_pixel_bitmap.c: In function 'intelBitmap':
intel_pixel_bitmap.c:343: warning: implicit declaration of function '_mesa_meta_Bitmap'
Silence this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:578: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:578: note: 'r' was declared here
Up to 2010-09-19:
r600g: fix tiling support for ddx supplied buffers
9b146eae25
user buffer seems to be broken... new to fix that.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Before, changing any of these sampler values triggered generation
of new JIT code. Added a new flag for the special case of
min_lod == max_lod which is hit during auto mipmap generation.
TX_BORDER_COLOR should be formatted according to the texture format.
Also the interaction with ARB_texture_swizzle should be fixed too.
NOTE: This is a candidate for the 7.9 branch.
This commit fixes an infinite loop in foreach_s if remove_from_list is used
more than once on the same item with other list operations in between.
NOTE: This is a candidate for the 7.9 branch because the commit
"r300g: fixup long-lived BO maps being incorrectly unmapped when flushing"
depends on it.
This is already covered by radeon_mipmap_tree.c, and my convolution
cleanups broke in the presence of this code. Thanks to Marek Olšák
for tracking down the relevant miptree code for me.
Make libr300compiler.a a PHONY target so that this library will always be
built. This fixes the problem of libr300compiler.a not being updated
when r300g is being built and r300c is not.
This is a candidate for the Mesa 7.9 branch.
Many of the EXT_ extensions in the subset have significant code
overhead with no users. It is not a required part of GL -- though
text describing the extension is part of the core spec since 1.2, it
is always conditional on the ARB_imaging extension.
Some pathological triangles cause a theoritically impossible number of
clipped vertices.
The clipper will still assert, but at least release builds will not
crash, while this problem is further investigated.
use the blitter + custom stage to avoid doing a whole lot of state
setup by hand. This makes life a lot easier for doing this on evergreen
it also keeps all the state setup in one place.
We setup a custom context state at the start with a flag to denote
its for the flush, when it gets generated we generate the correct state
for the flush and no longer have to do it all by hand.
this should also make adding texture *to* depth easier.
It turns out that most people new to this IR are surprised when an
assignment to (say) 3 components on the LHS takes 4 components on the
RHS. It also makes for quite strange IR output:
(assign (constant bool (1)) (x) (var_ref color) (swiz x (var_ref v) ))
(assign (constant bool (1)) (y) (var_ref color) (swiz yy (var_ref v) ))
(assign (constant bool (1)) (z) (var_ref color) (swiz zzz (var_ref v) ))
But even worse, even we get it wrong, as shown by this line of our
current step(float, vec4):
(assign (constant bool (1)) (w)
(var_ref t)
(expression float b2f (expression bool >=
(swiz w (var_ref x))(var_ref edge))))
where we try to assign a float to the writemasked-out x channel and
don't supply anything for the actual w channel we're writing. Drivers
right now just get lucky since ir_to_mesa spams the float value across
all the source channels of a vec4.
Instead, the RHS will now have a number of components equal to the
number of components actually being written. Hopefully this confuses
everyone less, and it also makes codegen for a scalar target simpler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can't expect to have a context when this is called, and we don't need one
so just require a __DRIscreen instead.
Reported by Yu Dai <yu.dai@intel.com>
Shader rebuild should be more clever, we should store along each
shader all the value that change shader program rather than using
flags in context (ie change sequence like : change vs buffer, draw,
change vs buffer, switch shader will trigger useless shader rebuild).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Normally the Mesa state tracker uses TXP instructions for texturing.
But if a fragment shader uses texture2D() that's a TEX instruction.
In that case we were incorrectly computing the texcoord coefficients
in the point sprite setup code. Some new comments in the code explain
things.
Fix commit e7087175f8. Move the reference to
GL_VERSION_2_1_functions to intel_extensions.c where it's available,
don't try to enable a non-existing extension and advertise 1.20 for all
intel chipsets, not just GEN4 and up.
progs requires winsys, which hasn't yet been built by the time we
go into state_trackers.
It may be a good idea to also move it into tests.
After a normal build, run make in src/gallium/state_trackers/d3d1x/progs
to build them.
The Gallium EGL state tracker reuses dri2.c but not the GLX code.
Currently there is a bit of code in dri2.c that is incorrectly tied
to GLX: instead, make it call an helper that both GLX and Gallium EGL
implement, like dri2InvalidateBuffers.
This avoids a link error complaining that dri2GetGlxDrawableFromXDrawableId
is undefined.
Note that we might want to move the whole event translation elsewhere,
and probably stop using non-XCB DRI2 altogether, but this seems to be
the minimal fix.
This may just be hiding some other bug though, since the types are supposed
to be the same (and it compiles for me).
Anyway, this interface will likely need to changed, since it seems Wine needs
a more powerful one capable of expressing window subregions and called at
every Present.
add cb/db flush states to the blit code.
add support for the rv6xx that need special treatment.
according to R6xx_7xx_3D.pdf
set r700 CB_SHADER_CONTROL reg in blit code
docs say dual export should be disabled for DB->CB
Instead of using the invalid GL_ARB_shading_language_120 extension to
determine the GLSL version, use a new ctx->Const.GLSLVersion field.
Updated the intel and r600 drivers, but untested.
See fd.o bug 29910
NOTE: This is a candidate for the 7.9 branch (but let's wait and see if
there's any regressions).
The old code didn't really make sense. We only need to compare the
X channel of the texture (depth) against the texcoord.
For (bi)linear sampling we should move the calls to this function
and compute the final result as (s1+s2+s3+s4) * 0.25. Someday.
This fixes the glean glsl1 shadow2D() tests. See fd.o bug 29307.
There was some libstdc++-specific code that would only build with GCC 4.5
Now it should be much more compatible, at the price of reimplementing
the generic hash function.
Looks like the problem was we weren't passing the depth to the render
target as expected, so the chip would wedge. Fixes GPU hang in
occlusion-query-discard.
Bug #30097
Recent Wine versions provide a d3d11shader.h, which is however empty
and was getting used instead of our non-empty one.
Correct the include path order to fix this.
This is a new implementation of the Direct3D 11 COM API for Gallium.
Direct3D 10 and 10.1 implementations are also provided, which are
automatically generated with s/D3D11/D3D10/g plus a bunch of #ifs.
While this is an initial version, most of the code is there (limited
to what Gallium can express), and tri, gears and texturing demos
are working.
The primary goal is to realize Gallium's promise of multiple API
support, and provide an API that can be easily implemented with just
a very thin wrapper over Gallium, instead of the enormous amount of
complex code needed for OpenGL.
The secondary goal is to run Windows Direct3D 10/11 games on Linux
using Wine.
Wine dlls are currently not provided, but adding them should be
quite easy.
Fglrx and nvidia drivers can also be supported by writing a Gallium
driver that talks to them using OpenGL, which is a relatively easy
task.
Thanks to the great design of Direct3D 10/11 and closeness to Gallium,
this approach should not result in detectable overhead, and is the
most maintainable way to do it, providing a path to switch to the
open Gallium drivers once they are on par with the proprietary ones.
Currently Wine has a very limited Direct3D 10 implementation, and
completely lacks a Direct3D 11 implementation.
Note that Direct3D 10/11 are completely different from Direct3D 9
and earlier, and thus warrant a fully separate implementation.
The third goal is to provide a superior alternative to OpenGL for
graphics programming on non-Windows systems, particularly Linux
and other free and open systems.
Thanks to a very clean and well-though design done from scratch,
the Direct3D 10/11 APIs are vastly better than OpenGL and can be
supported with orders of magnitude less code and development time,
as you can see by comparing the lines of code of this commit and
those in the existing Mesa OpenGL implementation.
This would have been true for the Longs Peak proposal as well, but
unfortunately it was abandoned by Khronos, leaving the OpenGL
ecosystem without a graphics API with a modern design.
A binding of Direct3D 10/11 to EGL would solve this issue in the
most economical way possible, and this would be great to provide
in Mesa, since DXGI, the API used to bind Direct3D 10/11 to Windows,
is a bit suboptimal, especially on non-Windows platforms.
Finally, a mature Direct3D 10/11 implementation is intrinsically going
to be faster and more reliable than an OpenGL implementation, thanks
to the dramatically smaller API and the segregation of all nontrivial
work to object creation that the application must perform ahead of
time.
Currently, this commit contains:
- Independently created headers for Direct3D 10, 10.1, 11 and DXGI 1.1,
partially based on the existing Wine headers for D3D10 and DXGI 1.0
- A parser for Direct3D 10/11 DXBC and TokenizedProgramFormat (TPF)
- A shader translator from TokenizedProgramFormat to TGSI
- Implementation of the Direct3D 11 core interfaces
- Automatically generated implementation of Direct3D 10 and 10.1
- Implementation of DXGI using the "native" framework of the EGL st
- Demos, usable either on Windows or on this implementation
- d3d11tri, a clone of tri
- d3d11tex, a (multi)texturing demo
- d3d11gears, an improved version of glxgears
- d3d11spikysphere, a D3D11 tessellation demo (currently Windows-only)
- A downloader for the Microsoft HLSL compiler, needed to recompile
the shaders (compiled shader bytecode is also included)
To compile this, configure at least with these options:
--with-state-trackers=egl,d3d1x --with-egl-platforms=x11
plus some gallium drivers (such as softpipe with --enable-gallium-swrast)
The Wine headers (usually from a wine-dev or wine-devel package) must
be installed.
Only x86-32 has been tested.
You may need to run "make" in the subdirectories of src/gallium/winsys/sw
and you may need to manually run "sudo make install" in
src/gallium/targets/egl
To test it, run the demos in the "progs" directory.
Windows binaries are included to find out how demos should work, and to
test Wine integration when it will be done.
Enjoy, and let me know if you manage to compile and run this, or
which issues you are facing if not.
Using softpipe is recommended for now, and your mileage with hardware
drivers may vary.
However, getting this to work on hardware drivers is also obviously very
important.
Note that currently llvmpipe is buggy and causes all 3 gears to be
drawn with the same color.
Use export GALLIUM_DRIVER=softpipe to avoid this.
Thanks to all the Gallium contributors and especially the VMware
team, whose work made it possible to implement Direct3D 10/11 much
more easily than it would have been otherwise.
Use rc_pair_ prefix for all pair instruction structs
Create a named struct for pair instruction args
Replace structs radeon_pair_instruction_{rgb,alpha} with struct
radeon_pair_sub_instruction. These two structs were nearly identical
and were creating a lot of cut and paste code. These changes are the
first step towards removing some of that code.
This allow to share code path btw old & new, also
remove check on reference this might make things
a little slower but new design doesn't use reference
stuff.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
If we are not going to write to the X or Y components of the destination
vector we also don't need to prepare to compute SIN or COS.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
When ir_binop_all_equal and ir_binop_any_nequal were introduced, the
meaning of these two opcodes changed to return vectors rather than a
single scalar, but the constant expression handling code was incorrectly
written and only worked for scalars. As a result, only the first
component of the returned vector would be properly initialized.
This reverts commit c32bac57ed
and silences the warning differently.
The _mesa_reference_texobj() function is called quite a bit and
we don't want to call valid_texture_object() all the time in non-
debug builds.
Count int and float constants independently. Since there are only
few i# constants available and hundreds of c# constants, it would
be too easy to end up with an i# declaration out of its range.
Pixel shaders do not have address registers a#, only one
loop register aL. Our only hope is to assume the address
register is in fact a loop counter and replace it with aL.
Do not translate ARL instruction for pixel shaders -- MOVA
instruction is only valid for vertex saders.
Make it more explicit relative addressing of inputs is only valid
for pixel shaders and constants for vertex shaders.
we were leaking buffers since the flush code was added, it wasn't dropping references.
move setting up flush to the set_framebuffer_state.
clean up the flush state object.
make more space in the BOs array for flushing.
This reverts commit a1d9a58b82.
Flushing the upload buffers on draw is wrong, uploads aren't supposed to
cause flushes in the first place. The real issue was
radeon_bo_pb_map_internal() not respecting PB_USAGE_UNSYNCHRONIZED.
Depth-only and stencil-only clears should mask out depth/stencil from the
output, mask out stencil/input from input, and OR or ADD them together.
However, due to a typo they were being ANDed, resulting in zeroing the buffer.
If a upload buffer is used by a previous draw that's still in the CS,
accessing it would need a context flush. However, doing a context flush when
mapping the upload buffer would then flush/destroy the same buffer we're trying
to map there. Flushing the upload buffers before a draw avoids both the CS
flush and the upload buffer going away while it's being used. Note that
u_upload_data() could e.g. use a pool of buffers instead of allocating new
ones all the time if that turns out to be a significant issue.
Silences this GCC warning.
nvfx_vertprog.c: In function 'nvfx_vertprog_translate':
nvfx_vertprog.c:998: warning: assignment discards qualifiers from pointer target type
Those have the callee field set to the null pointer, so
calling the public constructor will segfault.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Fixes this GCC warning.
lower_variable_index_to_cond_assign.cpp:
In member function
'bool variable_index_to_cond_assign_visitor::needs_lowering(ir_dereference_array*) const':
lower_variable_index_to_cond_assign.cpp:261:
warning: control reaches end of non-void function
This diagram shows the rendering pipeline with an emphasis on
the inputs/outputs for each stage. Some stages emit new vertex
attributes and others consume some attributes.
Implement the pipe_rasterizer_state::sprite_coord_enable field
in the draw module (and softpipe) according to what's specified
in the documentation.
The draw module can now add any number of extra vertex attributes
to a post-transformed vertex and generate texcoords for those
attributes per sprite_coord_enable. Auto-generated texcoords
for sprites only worked for one texcoord unit before.
The frag shader gl_PointCoord input is now implemented like any
other generic/texcoord attribute.
The draw module now needs to be informed about fragment shaders
since we need to look at the fragment shader's inputs to know
which ones need auto-generated texcoords.
Only softpipe has been updated so far.
Fixes this GCC warning.
r600_state2.c: In function 'r600_context_flush':
r600_state2.c:946: error: implicit declaration of function 'drmCommandWriteRead'
Winsys context build a list of register block a register block is
a set of consecutive register that will be emited together in the
same pm4 packet (the various r600_block* are there to provide basic
grouping that try to take advantage of states that are linked together)
Some consecutive register are emited each in a different block,
for instance the various cb[0-7]_base. At winsys context creation,
the list of block is created & an index into the list of block. So
to find into which block a register is in you simply use the register
offset and lookup the block index. Block are grouped together into
group which are the various pkt3 group of config, context, resource,
Pipe state build a list of register each state want to modify,
beside register value it also give a register mask so only subpart
of a register can be updated by a given pipe state (the oring is
in the winsys) There is no prebuild register list or define for
each pipe state. Once pipe state are built they are bound to
the winsys context.
Each of this functions will go through the list of register and
will find into which block each reg falls and will update the
value of the block with proper masking (vs/ps resource/constant
are specialized variant with somewhat limited capabilities).
Each block modified by r600_context_pipe_state_set* is marked as
dirty and we update a count of dwords needed to emit all dirty
state so far.
r600_context_pipe_state_set* should be call only when pipe context
change some of the state (thus when pipe bind state or set state)
Then to draw primitive you make a call to r600_context_draw
void r600_context_draw(struct r600_context *ctx, struct r600_draw *draw)
It will check if there is enough dwords in current cs buffer and
if not will flush. Once there is enough room it will copy packet
from dirty block and then add the draw packet3 to initiate the draw.
The flush will send the current cs, reset the count of dwords to
0 and remark all states that are enabled as dirty and recompute
the number of dwords needed to send the current context.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Currenly GLSL happily generates indirect addressing of any kind of
arrays.
Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.
This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.
Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.
Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.
Several patches in the glsl2-lower-variable-indexing were squashed
into this commit. These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
this adds the bo caching layer and uses it for vertex/index/constant bos.
ctx needs to take references on hw bos so the flushing works okay, also
needs to flush the maps.
This function is executed outside _mesa_meta_begin/end(), that means
that e.g. _mesa_meta_Bitmap() clobbers the texturing state because it
changes the currently active texture object.
There's no need to bind the new texture when it's created, it's done
again later anyway (from setup_drawpix/copypix_texture()).
Signed-off-by: Brian Paul <brianp@vmware.com>
Allows disabling various operations (mainly texture-related, but
will grow) to try & identify bottlenecks.
Unlike LP_DEBUG, this is active even in release builds - which is
necessary for performance investigation.
The current context should be notified when the the front/back buffers
of the current drawable are swapped. The notification was skipped when
xmesa_strict_invalidate is false (the default).
This fixes fdo bug #29774.
Enable some extensions now that the needed tokens are defined in
GLES/glext.h and GLES2/glext.h. Update the prototype of MultiDrawArrays
now that the prototype of _mesa_MultiDrawArraysEXT has been updated.
The shadow versions of the texture targets use an extra component
(Z) to express distance from light source to the fragment.
Fixes the shadowtex demo with llvmpipe.
Execute before folding loads, because we don't check if it's legal
in lower_mods.
Ensure that a value's insn pointer is updated when transferring it
to a different instruction.
Fix the following GCC warning.
loop_controls.cpp: In function 'int calculate_iterations(ir_rvalue*, ir_rvalue*, ir_rvalue*, ir_expression_operation)':
loop_controls.cpp:88: warning: format not a string literal and no format arguments
This fixes a DRM deadlock in the cubestorm xscreensaver, because somehow
there must not be 2 different BOs relocated in one CS if both BOs back
the same handle. I was told it is impossible to happen, but apparently
it is not, or there is something else wrong.
when setting negative integers to bitfields we could overwrite
other parts of it. So mask the value to be written correctly.
This is used quite often in the driver - hope it doesnt affect
performace or uncover behaviour relied before...
fixes strange effects when setting negative lodbias on evergreen
A number of other files had to be updated as well because const
qualifiers were added to the glMultiDrawArrays() function.
Also, GL_FIXED is now defined in glext.h.
Fixes the following GCC warning.
i915_screen.c: In function 'i915_get_shader_param':
i915_screen.c:184: warning: control reaches end of non-void function
Fixes the following GCC warning.
i915_screen.c: In function 'i915_get_shader_param':
i915_screen.c:147: warning: implicit declaration of function 'draw_get_shader_param'
This is a follow-up to commit a508d2dddc.
Fixes the following GCC warning.
id_screen.c: In function 'identity_screen_create':
id_screen.c:317: warning: assignment from incompatible pointer type
This is a follow-up to commit a508d2dddc.
Fixes the following GCC warning.
rbug_screen.c: In function 'rbug_screen_create':
rbug_screen.c:331: warning: assignment from incompatible pointer type
Note, BTW, that the Gallium implementation returns 0.5, which seems
to violate the GLSL spec, where it should return 0.0 instead.
Not sure whether changing it to 0 is correct or not.
This turns on if conversion and unlimited loop unrolling if control
flow is not supported.
NOTE: this will change the behavior of r300g and any other driver
that doesn't advertise control flow
Changes in v3:
- Also change trace, which I forgot about
Changes in v2:
- No longer adds tessellation shaders
Currently each shader cap has FS and VS versions.
However, we want a version of them for geometry, tessellation control,
and tessellation evaluation shaders, and want to be able to easily
query a given cap type for a given shader stage.
Since having 5 duplicates of each shader cap is unmanageable, add
a new get_shader_param function that takes both a shader cap from a
new enum and a shader stage.
Drivers with non-unified shaders will first switch on the shader
and, within each case, switch on the cap.
Drivers with unified shaders instead first check whether the shader
is supported, and then switch on the cap.
MAX_CONST_BUFFERS is now per-stage.
The geometry shader cap is removed in favor of checking whether the
limit of geometry shader instructions is greater than 0, which is also
used for tessellation shaders.
WARNING: all drivers changed and compiled but only nvfx tested
Currently GLSL IR forbids any vector comparisons, and defines "ir_binop_equal"
and "ir_binop_nequal" to compare all elements and give a single bool.
This is highly unintuitive and prevents generation of optimal Mesa IR.
Hence, first rename "ir_binop_equal" to "ir_binop_all_equal" and
"ir_binop_nequal" to "ir_binop_any_nequal".
Second, readd "ir_binop_equal" and "ir_binop_nequal" with the same semantics
as less, lequal, etc.
Third, allow all comparisons to acts on vectors.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
If the loop ends with an if with one break or in a single break unroll
it. Loops that end with a continue will have that continue removed by
the redundant jump optimizer. Likewise loops that end with an
if-statement with a break at the end of both branches will have the
break pulled out after the if-statement.
Loops of the form
for (...) {
do_something1();
if (cond) {
do_something2();
break;
} else {
do_something3();
}
}
will be unrolled as
do_something1();
if (cond) {
do_something2();
} else {
do_something3();
do_something1();
if (cond) {
do_something2();
} else {
do_something3();
/* Repeat inserting iterations here.*/
}
}
ir_lower_jumps can guarantee that all loops are put in this form
and thus all loops are now potentially unrollable if an upper bound
on the number of iterations can be found.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The loop_controls pass didn't look at the counter values it put in ir_loop
on previous iterations, so while the first iteration worked, subsequent
ones couldn't determine max_iterations.
Changes in v2:
- Base class renamed to ir_control_flow_visitor
- Tried to comply with coding style
This is a new pass that supersedes ir_if_return and "lowers" jumps
to if/else structures.
Currently it causes no regressions on softpipe and nv40, but I'm not sure
whether the piglit glsl tests are thorough enough, so consider this
experimental.
It can be asked to:
1. Pull jumps out of ifs where possible
2. Remove all "continue"s, replacing them with an "execute flag"
3. Replace all "break" with a single conditional one at the end of the loop
4. Replace all "return"s with a single return at the end of the function,
for the main function and/or other functions
This gives several great benefits:
1. All functions can be inlined after this pass
2. nv40 and other pre-DX10 chips without "continue" can be supported
3. nv30 and other pre-DX10 chips with no control flow at all are better supported
Note that for full effect we should also teach the unroller to unroll
loops with a fixed maximum number of iterations but with the canonical
conditional "break" that this pass will insert if asked to.
Continues are lowered by adding a per-loop "execute flag", initialized to
TRUE, that when cleared inhibits all execution until the end of the loop.
Breaks are lowered to continues, plus setting a "break flag" that is checked
at the end of the loop, and trigger the unique "break".
Returns are lowered to breaks/continues, plus adding a "return flag" that
causes loops to break again out of their enclosing loops until all the
loops are exited: then the "execute flag" logic will ignore everything
until the end of the function.
Note that "continue" and "return" can also be implemented by adding
a dummy loop and using break.
However, this is bad for hardware with limited nesting depth, and
prevents further optimization, and thus is not currently performed.
This is just a subclass of ir_visitor with empty implementations of all
the visit methods for non-control flow nodes.
Used to avoid duplicating that in ir_visitor subclasses.
ir_hierarchical_visitor is another way to solve this, but is less natural
for some applications.
According to gcc documentation both are equivalent,
second are prefered as first can make conflict with existing symbols.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
For GLX 1.3 drawables, we can destroy the DRI2 drawable when the GLX
drawable is destroyed. However, for legacy drawables, there os no
good way of knowing when the application is done with it, so we just
let the DRI2 drawable linger on the server. The server will destroy
the DRI2 drawable when it destroys the X drawable or the client exits
anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=30109
radeon_cs_space_check flushes the pipe context on failure, retries
the validation, and returns -1 if it fails again. At that point, there is
nothing we can do, so let's skip draw operations instead of getting stuck
in an infinite loop.
This code path ideally should never be hit.
I really don't understand the mechanism behind this, but it
seems like the way data blocks for a scene are malloced, and in
particular whether we treat them as stack or a queue, and whether
we retain the most recently allocated or least recently allocated
has a real affect (~5%) on isosurf framerates...
This is probably specific to my distro or even just my machine,
but none the less, it's nicer not to see the framerates go in the
wrong direction.
If the buffer we are attempting to map is referenced by the unsubmitted
command stream for this context, we need to flush the command stream,
however to do that we need to be able to access the context at the lowest
level map function, currently we set the buffer in the toplevel map, but this
racy between context. (we probably have a lot more issues than that.)
I'll look into a proper solution as suggested by jrfonseca when I get some time.
"llvm-config --cflags" outputs -f options, which conflict makedepend.
Clean up compiler flags and append LLVM_CFLAGS to the new xxx_CFLAGS
instead of xxx_CPPFLAGS, where xxx may be MESA, ES1, or ES2.
This is erroneously throwing non plain formats out of the faster
AoS sampling path.
Doing 8bit interpolation for single channels such as L8 should be no
worse than with floating point. But this may need more investigation.
The r300 compiler can now emit instructions that select from the presubtract
source. A peephole optimization has been added to convert instructions like:
ADD Temp[0].x, none.1, -Temp[1].x into the INV (1 - src0) presubtract
operation.
Remove the hard-to-get-right _eglBindContextToSurfaces. As well as fix
an assertion failure from b90a3e7d8b when
such call sequence is hit
eglMakeCurrent(dpy, surf1, surf1, ctx1);
eglMakeCurrent(dpy, surf2, surf2, ctx2);
eglMakeCurrent(dpy, surf1, surf1, ctx1);
Replace all uses of ST_API_OPENGL_ES{1,2} by profiles. Having 3
st_api's to provide OpenGL, OpenGL ES 1.1, and OpenGL ES 2.0 is not a
sane abstraction, since all of them share glapi for current
context/dispatch management.
Having 3 st_api's to provide OpenGL, OpenGL ES 1.1, and OpenGL ES 2.0 is
not a sane abstraction, since all of them share glapi for current
context/dispatch management.
Add struct st_context_attribs to describe context profiles and
attributes. Modify st_api::create_context to take the new struct
instead of an st_visual.
st_context_attribs can be used to support GLX_ARB_create_context_profile
and GLX_EXT_create_context_es2_profile in the future. But the
motivation for doing it now is to be able to replace ST_API_OPENGL_ES1
and ST_API_OPENGL_ES2 by profiles.
Having 3 st_api's to provide OpenGL, OpenGL ES 1.1, and OpenGL ES 2.0 is
not a sane abstraction, since all of them share glapi for current
context/dispatch management.
adds shader opcodes + assembler support (except ARL)
uses constant buffers
add interp instructions in fragment shader
adds all evergreen hw states
adds evergreen pm4 support.
this runs gears for me on my evergreen
the DDX and r600c both flush cb/db after the draw is emitted,
as long as they do that, r600g can't be different, as it races.
We end up with r600g flush, set CB, DDX set CB, flush. This
was causing misrendering on my evergreen, where sometimes the drawing
would go to an old CB.
This reverts commit b9abc6139a and the
follow on fixes (7aae704 and 6fe1b47). It's changing the glapi/driver
ABI and causes a number of problems for debug/non-debug builds.
Binding framebuffer 0 on a context that doesn't have a winsys drawable
will try to bind the incomplete framebuffer. That fails when that's
also the dummy framebuffer.
Since atm our OPs aren't typed but instead values are, we need to
take care if they're used as different types (e.g. a load makes a
value u32 by default).
Maybe this should be changed (also to match TGSI), but it should
work as well if done properly.
At some point we'll want to support real subroutines instead of
just inlining them into the main shader.
Since recursive calls are forbidden, we can just save all used
registers to a fixed local memory region and restore them on a
return, no need for a stack pointer.
There's a useful feature buried in glapi to log all API calls to stderr.
Unfortunately it requires editing the code and then it's enabled
unconditionally for that build. This patch builds in API logging for
debug builds and makes it run-time switchable by setting MESA_DEBUG=dispatch.
This increases the chance that GLSL programs will actually work.
Note that continues and returns are not yet lowered, so linking
will just fail if not supported.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This allows us to specify different options, especially useful for chips
without unified shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise builtin_profiles contains dangling pointers the next time
_mesa_read_profile is called. I suspect this may fix bugzilla #29847,
but I was never able to reproduce it.
GL_EXT_texture_env_combine has slightly more restrictive limits on the
valid sources for some operands. This wasn't caught before because
almost every driver in Mesa that supports the EXT version also
supports the ARB version.
Inspired by a patch posted the the mesa-dev mailing list by Andrew
Randrianasulu.
Since we now actually destroy GLX drawables, we get into situations where
we get events for drawables that no longer exist. Just ignore the
event in that case.
Doesn't work for pixmaps, was looking up the GLX XID and was never thread
safe. Instead, just destroy the client side structures when the
drawable is no long current for a context.
libmesagallium.a that this state tracker will be linked to expects
OpenGL's _glapi_table. That is, it expects libGL.so instead of
libGLESv1_CM.so or libGLESv2.so. As there is no clean way to know the
shared library the app links to, use the api as a simple check. It
might be as well to simply remove this function call though.
Make st/dri screens capable of creating OpenGL ES and
OpenGL ES2 contexts.
TODO: Figure out the "get_current" problem with multiple
st_api's for real.
(s/API_OPENGLES1/API_OPENGLES/ by Chia-I Wu)
This allows them to be passed as out/inout parameters, but still
prevents them from being used as the target of an assignment. This is
per section 5.8 of the GLSL ES 1.00 specification.
This effectively reverts b6f15869b3.
In desktop GLSL, defining a function with the same name as a built-in
hides that built-in function completely, so there would never be
built-in and user function signatures in the same ir_function.
However, in GLSL ES, overloading built-ins is allowed, and does not
hide the built-in signatures - so we're back to needing this.
Fixes an assert (min_version >= 110) which was no longer correct, and
also prohibits linking ES2 shaders with non-ES2 shaders. I'm not
positive this is correct, but the specification doesn't seem to say.
This should make it easier to change the default version based on the
API (say, version 1.00 for OpenGL ES).
Also, synchronize the symbol table's version with the parse state's
version just before doing AST-to-HIR. This way, it will be set when
it matters, but the main initialization code doesn't have to care about
the symbol table.
This splits the r600 opcodes out of the sq file and adds a wrapper
so we can convert to evergreen opcodes later without touching these functions
too much.
DX9 constants were in the constant file, and evergreen no longer support
cfile. r600/700 can also use constants in memory buffers, so add the code
(disabled for now) to enable that as precursor for evergreen.
Like the constant handling and the handling of other uniforms, we add
the whole thing to the Parameters, avoiding messy, incomplete logic
for adding just the elements of a builting uniform that get used.
This means that a driver that relies only on ParameterValues[] for its
parameters will have an increased parameter load, but drivers
generally don't do that (since they have other params they need to
handle, too).
Fixes glsl-fs-statevar-call (testcase for Ember). Bug #29687.
v2: Continue referencing the STATE_VAR[] file directly when the
uniform will land in STATE_VAR[] formatted exactly as we'd put into a
temporary. When there's array dereferencing, we don't copy-propagate
in Mesa IR (not knowing where the array is in register space), so
smarts here are required or we'll massively increase the temp count.
Returning early with visit_continue_with_parent prevented the
then-statements and else-statements of if-statements such as the
following from being processed:
if (some_var) { ... } else { ... }
Fixes piglit test case glsl-fs-loop-nested-if and bugzilla #30030.
We carefully multiplied our two ints (since we want to be precise
after all) then stored them in a float, which is not specced to really
work, in addition to wasting precision. Fixes
vp-arl-constant-array-huge-* things since the assertions were added.
There is a restriction on the destination of an operation involving a
vector immediate being 128-bit aligned and the destination horizontal
stride being equivalent to 2 bytes. Fixes bad pixel_x results from
gl_FragCoord, where each pair had the same value.
Otherwise spring 0.82+.4.0 crashes when starting a game
because prog->_LinkedShaders[0] is NULL.
This also fixes piglit test cases glsl-link-empty-prog-0[12].
This reverts 6a6e6d7b0a and initializes
dummyContext with an all NULL vtable. The context vtable pointer is
supposed to always be non-NULL, but the vtable entries can be NULL.
I was hitting this with gliv.
The GLX spec explicitly mentions that glXWaitX, glXWaitGL and glXUseXFont calls
are ignored when there's no current context. Not sure what if anything the
GLX_EXT_texture_from_pixmap spec says about this, but I think ignoring the
calls makes more sense than crashing there as well. :)
Dumping back to potentially 16-wide dispatch doesn't really work out
at the moment, and hopefully I'll just be able to resolve all the
failures so we never have to do this at all.
Code in glx/glxcmds.c which uses the XF86VIDMODE extension is already guarded. Also use
that guard to control inclusion of the xf86vmode.h header, and only enable that guard if the
XF86VIDMODE extension is found by pkgconfig.
This changes the behaviour on platforms which XF86VIDMODE exists, in that XF86VIDMODE used to
be mandatory, but is now optional.
Presumably other build systems are already arranging for -DXF86VIDMODE to be supplied to the
complier when glxcmds.c is compiled, so are not affected by this change
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Nothing direct rendering specific about these fields. Moving them out
makes no-direct-rendering compilation work again.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This fixes some of the build issues with GLX_INDIRECT_RENDERING but !GLX_DIRECT_RENDERING due to recent changes.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
The calling order of ->bind and ->unbind changed and then ->unbind would
clear the currentContextTag of the old context before ->bind could reuse
it in the make current request, in the indirect case.
Instead, clear the old currentContextTag if and only if we send a request
to the server to actually unbind it or reassign it to another context.
https://bugs.freedesktop.org/show_bug.cgi?id=29977
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
This was inherently fragile as any changes to r600_states.h would also
need manual updating of all of the bits in radeon.h. Just add a simple
python script to do the conversion, its not hooked up to make at all.
This also will make adding evergreen a bit easier.
nv50 should switch to rules-ng-ng too at some point.
The classic Mesa Nouveau driver also includes a copy of nouveau_class.h,
and should convert to rules-ng-ng too and remove it.
This is the new register generation toolkit in use by nouveau.
As far as I know, this is the best register description toolkit in
existence, and you should use it too for your hardware :)
Thanks to Marcin Kościelnicki for inventing it and performing
invaluable reverse engineering work of nVidia chips.
The old swtnl code was broken by the new shader linkage support for
GLSL.
This is a rewrite of swtnl support, which should instead work properly,
be faster and more closer to the much more tested hardware pipeline.
Intuition != mathematics, so this time I actually worked out the right
formula for first order approximation of perspective interpolation.
Ironically, per quad divide actually makes things slower when compared
with per pixel divide -- probably because the divide hardware unit is
rarely used, whereas the multiply unit is typically already saturated
and the first order approximation imply more multiplications.
These are the non-trivial conversions that this function recognizes,
which was produced by u_format_compatible_test.c:
b8g8r8a8_unorm -> b8g8r8x8_unorm
a8r8g8b8_unorm -> x8r8g8b8_unorm
b5g5r5a1_unorm -> b5g5r5x1_unorm
b4g4r4a4_unorm -> b4g4r4x4_unorm
l8_unorm -> r8_unorm
i8_unorm -> l8_unorm
i8_unorm -> a8_unorm
i8_unorm -> r8_unorm
l16_unorm -> r16_unorm
z24_unorm_s8_uscaled -> z24x8_unorm
s8_uscaled_z24_unorm -> x8z24_unorm
r8g8b8a8_unorm -> r8g8b8x8_unorm
a8b8g8r8_srgb -> x8b8g8r8_srgb
b8g8r8a8_srgb -> b8g8r8x8_srgb
a8r8g8b8_srgb -> x8r8g8b8_srgb
a8b8g8r8_unorm -> x8b8g8r8_unorm
r10g10b10a2_uscaled -> r10g10b10x2_uscaled
r10sg10sb10sa2u_norm -> r10g10b10x2_snorm
State trackers and pipe drivers should be updated to take advantage of
this knowledge, e.g., in surface_copy.
Also, include the color buffer in the key. Not having it there
causes a tight knots in the logic to determine when it is OK or not
to discard previous color buffer contents.
This extra validation is very useful when working on the built-ins, but
in general overkill - the results should stay the same unless the
built-ins or ir_validate have changed.
Also, validating all the built-in functions in every test case makes
piglit run unacceptably slow.
We might want to copy them as color ones though.
Also works around crash in Unigine Heaven due to failing to allocate
a 64 MB temporary in GART for a CPU copy.
Unigine Heaven now works on nv40, albeit with very heavy glitches (with
the floating branch with render_hdr 0).
Actually, we may want to get rid of the x/y coordinates for linear
surfaces, and realign the origin from scratch if necessary, instead
of doing this "on-demand realignment".
This reverts commit 5ad74779ce.
Sorry, but I had to revert this.
Any commit which needlessly increases the number of temporaries is wrong.
More temporaries mean less shader performance because of reduced parallelism
and therefore less efficient latency hiding. In this case, there is possible
performance degradation of every shader which uses GL state variables.
I cannot accept this.
Those are:
- dead-code elimination
- constant folding
- peephole (mainly copy propagation)
- register allocation
There are some bugs which I need to track down.
Also fix up the descriptions of all the debug options.
First list compiler passes in an array, then run the new function rc_run_compiler.
Every backend may need a different set of passes.
This cleans up the mess in r3xx_compile_vertex_program.
Since functions are emitted when scanning for prototypes, functions
always come first, even if the original IR listed the variable
declarations first.
Fixes an ir_validate error (to be turned on in the next commit).
We were incorrectly setting a register that limited the range of
constants accessible via indirect addressing.
Setting it correctly, we can address all the constants the GPU
supports.
The places where constant_expression_value are still used in loop
analysis are places where a new expression tree is created and
constant folding won't have happened. This is used, for example, when
we try to determine the maximal loop iteration count.
Based on review comments by Eric. "...rely on constant folding to
have done its job, instead of going all through the subtree again when
it wasn't a constant."
The downside of our talloc usage is that we can't really make static
(i.e., not created with new) instances of our IR types. This leads to
a lot of unnecessary dynamic allocation in this patch.
Okay I finally wrapped my head around what r600_context_state is meant to be,
maybe I should just rename all the structs so that have distinct names.
I've no idea however why 16 is a good magic number for R600_MAX_RSTATE.
This was another ugly function that really wasn't needed.
The 3 calls to it from the gallium api were shorter than it,
and all the calls from the set_ functions were pointless.
having some sort of locality of code really matters, just create
and setup state at time. Not sure if this is just further polishing of a bad thing,
but at least it makes it more readable.
Previously bind sampler/sampler_view can be converted and endup
overwritting the current state we want to schedule. Example :
bind texA texB to sampler_view[0] & sampler_view[1], render,
bind texB to sampler_view[0] render. Now state associated to
texB are set to configure sampler_view slot 0, but as we don't
unbind sampler_view[1] still point to texB state so we end up
with sampler_view[1] overwritting sampler_view[0], which gives
wrong rendering if next rendering bind texA to sampler_view[0],
it will endup as texB is bound to sampler_view[0]. If you are
not confuse at that point give me a call i will be buying you
beer.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Now that constructors are not generated as functions or stored in the
symbol table, there is no need to flag whether or not constructors
should be returned.
One can bind same texture or sampler to different slot,
each slot needs it own state. The solution implemented
here is not exactly beautifull or optimal need to think
to somethings better.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
If the matrix being constructed was larger than the source matrix, it
would overwrite the lower-right part of the matrix with the wrong
values, rather than leaving it as the identity matrix.
For example, constructing a mat4 from a mat2 should only use a writemask
of "xy" when copying from the source, but was using "xyzw".
Fixes the code generated by piglit test constructor-23.vert.
This was triggering even for matrix-from-matrix constructors. It is
perfectly legal to construct a mat3 from a mat2 - the rest will be
filled in by the identity matrix.
Changes piglit test constructor-23.vert from FAIL to PASS, but the
generated code is incorrect.
Like the constant handling and the handling of other uniforms, we add
the whole thing to the Parameters, avoiding messy, incomplete logic
for adding just the elements of a builting uniform that get used.
This means that a driver that relies only on ParameterValues[] for its
parameters will have an increased parameter load, but drivers
generally don't do that (since they have other params they need to
handle, too).
Fixes glsl-fs-statevar-call (testcase for Ember). Bug #29687.
Regresses glsl-vs-array-04 on 965. Thanks to a slight change in
register allocation, this test of undefined behavior now wraps around
the register space and unexpectedly reads the constant value it's
trying to compare to. The test should probably not look at the
resulting color, since behavior is undefined.
Most of these are just typecasting to long to match the arg type. I
don't really care too much about getting a GLsizei or whatever
appropriate type in. However, there were a number of real bugs, like
missing arguments or passing floats to integer format specifiers. My
favorite: printflike("%s, argument") is missing an argument.
Make state statically allocated, this kills a bunch of code
and avoid intensive use of malloc/free. There is still a lot
of useless duplicate function wrapping that can be kill. This
doesn't improve yet performance, needs to avoid memcpy states
in radeon_ctx_set_draw and to avoid rebuilding vs_resources,
dsa, scissor, cb_cntl, ... states at each draw command.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
It looks like we were ignoring -linker when -noprefix wasn't present, and
when -noprefix was present, -linker was mandatory and -cplusplus ignored.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Teach mklib/minstall more about cygwin so libraries are properly installed
Have mklib install the .dll into the lib/ staging directory as well
Have minstall install the .dll into PREFIX/bin at the same time as
installing the .dll.a link library into PREFIX/lib
mklib uses a '-' rather than a '.' as the separator before the version
number in library names on cygwin. Change the install globs so they match
library names like that.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Change mklib not to report the full archname when building a library for cygwin
(which is something like 'CYGWIN_NT-5.1' or 'CYGWIN_NT-6.1-WOW64' and kind of
confusing), but just 'CYGWIN'.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
The GLSL 1.20 spec specifically disallows this, but it was allowed in
GLSL 1.10.
Fixes piglit test cases local-function-0[13].frag and bugzilla #29921.
This reverts commit de0b76cab2, its pre-computes the texture state wrong,
you can't just use an array of levels, since you can have FBOs to depth texture slices inside a level as well
it would get really messy quickly. Probably need to split commits like this up into pieces for each piece
of state, so we can revert bits easier in case of regressions.
This also break 5 piglit tests, and valgrind starts to warn about invalid read/writes after this.
fixes warning that
r600_blit.c: In function ‘r600_resource_copy_region’:
r600_blit.c:136: warning: passing argument 1 of ‘util_resource_copy_region’ from incompatible pointer type
and also 7 more piglit tests.
Fixes glsl-uniform-linking-1 and failure to link a shader in Unigine.
An alternative here would be to just ditch using _mesa_add_parameter
and build the initial params list on our own, but that would require
two walks of the list as well.
Bug #29822
Previously, using bit-wise operators in some larger expression would
crash on a NULL pointer dereference. This code at least doesn't crash.
Fixes piglit test bitwise-01.frag.
For some reason r600c, emits extra instructions in the FP to do the depth write output swizzle,
I'm not sure this is required, so here I'm doing it in the exports.
this fixes the mesa trivial demos tri-depthwrite and tri-depthwrite2, it doesn't fix
the glsl1 gl_FragDepth writing test however.
Fixes piglit test glsl-override-builtin. The linker incorrectly found
the prototype for the float signature, rather than adding a new
prototype with the int return type. This caused ir_calls with type int
to have their callees set to the float signature, triggering an assert.
this is a bit of a workaround, something is wrong with the literal emits here
so we just use the trig copy function to copy the immd to a temp at start of op.
fix VP/FP LIT tests
This includes several corrections for fixing depth test on sandybridge.
Fix wrong bits definition in depth stencil state. Fix wrong order of
state buffer offset in 3DSTATE_CC_STATE_POINTERS command. Correctly use
buffer width parameter in depth buffer setting.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
So as the trig functions used up the literal spots for the PI work, if the arg0 was an immediate
we'd hit failure, so copy the literal before starting.
add some tracking of max temp used to avoid trashing temp regs.
5 more piglits, fp1 COS,SCS,SIN tests
Also add an error if we hit this problem again, we need to do this better
possibly tying the literal addition to the last flag.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When releasing the builtin functions, we were just freeing the memory,
not telling the builtin function loader that we had freed its memory.
I wish I had done ARB_ES2_compatibility so we had regression testing
of this path. Fixes segfault on changing video options in nexuiz.
All pragmas are currently ignored. Also, the error messages when a
pragma is used incorrectly (i.e., '#pragma debug(on)' inside a
function) are crap, but I think this is sufficient for now.
Fixes piglit test cases pragma-0[1-8].(vert|frag).
Previously, if an attribute was enabled by either a specific GL version
or an extension, the check would require -both- to be enabled. This bug
was not discovered earlier because version checks are currently only ever
used on their own.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Idea is to build hw state at pipe state creation and
reuse them while keeping a non PM4 packet interface
btw winsys & pipe driver. This commit also force rebuild
of pm4 packet on each call to radeon_state_pm4 which
in turn slow down everythings, this will be addressed.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The current states code had an unhealthy relationship between
that had to somehow magically align themselves, editing either
place meant renumbering all states after the one you were on,
and it was pretty unapproachable code.
This replaces the huge types structures with a simple type + sub
type struct, which is keyed on an stype enum in radeon.h. Each
stype can have a per-shader type subclassing (4 types supported,
PS/VS/GS/FS), and also has a number of states per-subtype. So you
have 256 constants per 4 shaders per one CONSTANT stype.
The interface from the driver is changed to pass in the tuple,
(stype, id, shader_type), and we look for this. If
radeon_state_shader ever shows up on profile, it could use a
hashtable based on stype/shader_type to speed things up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This includes a handy little safety check to prevent the loop from
going "too long", as permitted by the spec. I haven't gone out of my
way to test it, though…
Fixes 20 more piglit tests.
Like the comparison operations, this suffered from CMP only setting
the low bit. Doing the AND instructions would be the same instruction
count as the more obvious conditional moves, so do cond moves.
Fixes glsl-fs-sign and 6 other cases, like trig functions that use
sign() internally.
Addresses the warnings:
warning: ‘target’ may be used uninitialized in this function
warning: ‘target_string’ may be used uninitialized in this function
Fixes gcc warning
In function ‘_mesa_add_unnamed_constant’:
warning: ‘pos’ may be used uninitialized in this function
but also what appears to be a bug.
By default LLVM adds a signal handler to output a pretty stack trace.
This signal handler is never removed, causing problems when unloading
the shared object where the gallium driver resides.
Thanks to Chris Li for finding this.
If two shaders contain variables declared with array types that have
the same base type but one is sized and the other is not, linking
should succeed. I'm not super pleased with the way this is
implemented, and I am more convinced than ever that we need more
linker tests. We especially need "negative" tests.
Fixes bugzilla #29697 and piglit test glsl-link-array-01.
Completely initialize data that is passed to ir_constant constructor.
Fixes piglit glsl-orangebook-ch06-bump valgrind uninitialized variable
error on softpipe and llvmpipe.
The
ir_constant::ir_constant(const struct glsl_type *type, exec_list *value_list)
did not completely initialize the entire value member.
Fixes piglit glsl-fs-sampler-numbering-2 valgrind uninitialized value
error in softpipe and llvmpipe.
Complete initialize data passed to ir_constant constructor.
Fixes piglit glsl-mat-from-int-ctor-02 valgrind unintialized variable
error with softpipe and llvmpipe.
Completely initialize data that is passed into a ir_constant constructor.
Fixes piglit glsl-fs-mix valgrind uninitialized variable error on
softpipe and llvmpipe.
Even though MIP filtering is not supported, we can bind an arbitrary mipmap
as the zero mipmap level.
NPOT textures now follow GL_TEXTURE_BASE_LEVEL and GL_TEXTURE_MIN_LOD.
This fixes piglit/fbo-copyteximage.
Due to a misunderstanding of the Z24_X8 and X8_Z24 formats, the earlier
patch created depth/stencil wrappers for them. This broke swrast.
Use the format info instead, which only identifies Z24_S8 and S8_Z24 as
packed depth/stencil. It also has the advantage of being nicer code.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Signed-off-by: Brian Paul <brianp@vmware.com>
This releases a bunch of memory that was showing up as leaks with
valgrind.
If atexit() isn't widely supported we may need to add some #ifdef
tests around the call.
glsl-algebraic-rcp-rsq managed to use 33 registers, and we claimed to
only use 32, so the write to g32 would go stomping over the precious
g0 of some other thread.
This wouldn't catch the last failure fixed in them, because we don't
validate assignments well (due to the fact that we've got a pretty
glaring inconsistency in how we handle assignment writemasking), but
it could catch other failure we may produce.
We weren't smearing a component of a split RHS out to reach an unsplit
LHS's writemask, so gl_FragColor (always unsplit) would often get
uninitialized values.
Fixes: glsl-algebraic-add-add-1 (and probably many others).
This reverts commit bd25e23bf3.
Apart from introducing a lot of hex magic numbers and being highly impenetable code,
it causes lots of lockups on an average piglit run that always runs without lockups.
Always run piglit before/after doing big things like this.
this adds handling for some more CF instructions and conditions
also adds parameter for stack size emission
These seem to pass on VS with the stack size hack but not on FS,
TODO: fix FS + stack size calcs
this makes op2 emission smaller, since it skips instructions
that don't write to the dst. not sure if this could have unwanted
side effects but try it and see.
Our channel-expressions and vector-splitting changes now happen into a
private copy of the IR that we maintain for ourselves. Uniform
assignment still happens by the core, so we continue using Mesa IR
generation not just for swrast fallbacks but also for uniform values
(since there's no storage for their contents other than
shader_program->FragmentProgram->Parameters->ParameterValues). And
most importantly, at the moment no actual codegen is hooked up other
than emitting our favorite color to the framebuffer.
Combined with the previous pass, this lets other optimization passes
do their work thanks to ir_tree_grafting. Still have regression in
instruction count with INTEL_NEW_FS, but register count is even
better.
This is a step towards implementing a GLSL IR backend for the 965
fragment shader. Because it has downsides with the current codegen,
it is hidden under the environment variable INTEL_NEW_FS.
This results in an increase in instruction count at the moment (1444
-> 1752 for glsl-fs-raytrace, 345 -> 359 on my demo), because dot
products are turned into a series of multiplies and adds instead of a
custom expansion of MULs and MACs, and by not splitting the variable
types up we don't get tree grafting and thus there are extra moves of
temporary storage. However, register count drops for the non-GLSL
path (64 -> 56 on my demo shader) because the register allocator sees
all the sub-operations.
Per the GLSL 1.20 specification (presumably a clarification of 1.10).
Also, when creating user functions, make a new ir_function that shadows the
built-in ir_function, rather than adding new signatures. User functions
are supposed to hide built-ins, not overload them.
Fixes piglit tests redeclaration-{04, 12, 14}.vert.
Moving the check for an earlier variable declaration helps cleanly
separate out the re-declaration vs. new declaration code a bit. With
that in place, conflicts between variable names and structure types or
function names aren't caught by the earlier "redeclaration" error
message, so check the return type on glsl_symbol_table::add_variable
and issue an error there. If one occurs, don't emit the initializer.
Fixes redeclaration-01.vert and redeclaration-09.vert.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
As of 1.20, variable names, function names, and structure type names all
share a single namespace, and should conflict with one another in the
same scope, or hide each other in nested scopes.
However, in 1.10, variables and functions can share the same name in the
same scope. Structure types, however, conflict with/hide both.
Fixes piglit tests redeclaration-06.vert, redeclaration-11.vert,
redeclaration-19.vert, and struct-05.vert.
Intel sometimes uses packed depth/stencil buffers even when only a depth
buffer or only a stencil buffer was requested. Common code currently
uses the _BaseFormat field to determine whether a depth/stencil wrapper
is necessary. But unless the user explicitly requested a packed
depth/stencil buffer, the _BaseFormat field does not encode this
information, and the required wrappers are not created.
The problem was introduced by commit 45e76d2665 ("mesa: remove a
bunch of gl_renderbuffer fields"), which killed off the _ActualFormat
field upon which the decision to create a wrapper used to be made. This
patch changes the logic to use the Format field instead, which is more
like the old code.
Fixes fdo bug 27590.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Signed-off-by: Brian Paul <brianp@vmware.com>
when we dont know max_index we cannot calculate vb size from count
anymore - just use the bo size.
Also added an assert to remind that we dont handle GL_INT GL_DOUBLE
upload when we dont' know max_index - will fix later
Keith prefers a clean separation between graw applications and
implementations, where apps do not link libgallium.a but instead
get all functionality they need via graw interface.
Although this is not incompatible with late loading of graw drivers, it
it would make it very hard to maintain, as wrappers for every utility
symbol exposed in graw would have to be written or generated somehow.
stObj->pt should be set in st_bind_surface, just as in st_TexImage. On
the other hand, st_TexImage should unreference stObj->pt. It also needs
to initialize the texture image again as _mesa_clear_texture_object
clears the image.
We had ad-hoc handled some common cases by flagging sampler-typed
variables as read_only, and rejected initializers of samplers.
However, people could sneak them in in all sorts of surprising ways,
like using whole-array or structure assignment.
Fixes:
glslparsertest/glsl2/sampler-01.frag
glslparsertest/glsl2/sampler-03.frag
glslparsertest/glsl2/sampler-04.frag
glslparsertest/glsl2/sampler-06.frag
Bug #27403.
This allows to build multiple graws libs simultaneously and avoid
unnecessary rebuilds of the tests.
Also remove graw_util.c from inside the graw implementation -- it was
only being provided by one implementation, and graw tests were linking
against gallium anyway.
This reverts commit 001a7bfdfc. I
hadn't found the section of the spec clarifying that the old behavior
was right. Reverting fixes the new version of the testcase, and the
Humus demos that could no longer find their uniforms.
Bug #29782
Bug #29783
Texcoords in AmbientApertureLighting were getting trashed since the
move of math arguments to implied moves, due to the logic for
detecting ALU message reg writes overriding the logic for SEND
implicit message reg writes.
handle very early errors in pipe_screen creation (failure of
nouveau_screen_init in nv50_screen_create)
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Directly build PM4 packet, avoid using malloc (no states are
bigger than 128 dwords), remove unecessary informations,
remove pm4 building in favor of prebuild pm4 packet.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
We do the generation of "what sampler number within Parameters are we"
right in ir_to_mesa.cpp, instead of repeatedly walking the existing
list to find out.
This was in place for uniform handling, but nothing actually needs the
value now, since presence in a parameter list indicates that the
uniform was used as far as the linker was concerned.
This begins a process of repurposing this file. The existing usage is
as a header file for some software blit fallbacks, which should be
moved to a more appropriately named header.
UNDEFINED_VERTEX_ID is used by draw_pipe_vbuf to decide whether a vertex
has been emitted or not. The non-pipeline pathes do not use it (they
tell the frontend the max vertex count when prepare() is called).
Update all drivers to use draw_set_index_buffer,
draw_set_mapped_index_buffer, and draw_vbo. Remove
draw_set_mapped_element_buffer and draw_set_mapped_element_buffer_range.
That is, implement draw_vbo directly. As a result,
svga_swtnl_draw_range_elements is also replaced by svga_swtnl_draw_vbo.
This commit should not have any functional change.
This commit adds draw_set_index_buffer, draw_set_mapped_index_buffer,
and draw_vbo. The idea behind the new functions is that an index buffer
should be a state.
draw_arrays and draw_set_mapped_element_buffer are preserved, but the
latter will be removed soon.
Include r600_emit.h for r600EmitShader and r600EmitShaderConsts symbols.
Fixes the following GCC warnings.
evergreen_fragprog.c: In function 'evergreenSetupFragmentProgram':
evergreen_fragprog.c:521: warning: implicit declaration of function 'r600EmitShader'
evergreen_fragprog.c:778: warning: implicit declaration of function 'r600EmitShaderConsts'
Include r600_emit.h for r600EmitShader and r600EmitShaderConsts symbols.
Fixes the following GCC warnings.
evergreen_vertprog.c:614: warning: implicit declaration of function 'r600EmitShader'
evergreen_vertprog.c:701: warning: implicit declaration of function 'r600EmitShaderConsts'
The variable loops would be used uninitialized if it ever processed a
RC_OPCODE_ENDLOOP case first.
This patch initalizes the loops variable to NULL and adds an assert at
the RC_OPCODE_ENDLOOP case that loops isn't NULL.
Silence the following GCC warning.
r3xx_vertprog.c: In function 'translate_vertex_program':
r3xx_vertprog.c:469: warning: 'loops' may be used uninitialized in this function
Fixes the following GCC warning.
evergreen_render.c: In function 'evergreenTryDrawPrims':
evergreen_render.c:836: error: implicit declaration of function 'evergreenSetupFragmentProgram'
Fixes the following GCC warnings.
r600_cmdbuf.h:201: warning: backslash and newline separated by space
r600_cmdbuf.h:202: warning: backslash and newline separated by space
If there is relative addressing of temporaries, we cannot change register
indices, so skip register allocation entirely. To utilize register allocation
at least partially, we need separate indexable and non-indexable register
files in both TGSI and Mesa IR.
Previously, uniform initializers were handled by ir_to_mesa as it made
its Parameters list. However, uniform values are global to all
shaders, and the value set in one Parameters list wasn't propagated to
the other gl_program->Parameters lists. By going back through the
general Mesa uniform handling, we make sure that all gl_programs get
updated values, and also successfully separate uniform initializer
handling from ir_to_mesa gl_program generation.
Fixes:
glsl-uniform-initializer-5.
With MSVC it seems that this class and its constructor is colliding
with the one in ir_variable_refcount.cpp. Rename the class here to
avoid the collision.
This is a bit of a hack. Can the two variable_entry classes be merged
and shared?
A variable_entry after construction should have its referenced_count
member set to 0. However, occassionally this isn't the case and
entry->referenced_count has been observed to be a garbage value. This
leads to crashes of several tests in the Piglit test suite.
This patch adds an assert to check that a variable_entry instance has
its referenced_count member initialized to 0 after construction.
drm_driver_descriptor::driver_name is defined to be the name of the
kernel module. We should check against drm_driver_descriptor::name
instead of drm_driver_descriptor::driver_name.
st/egl/x11/x11_screen.c requests a driver named r300 not radeon
KNOWN ISSUE: breaks st/egl/kms/
st/egl/kms requests a pipe named "radeon"
that will not be found now
so why not leaving pipe_radeon there?
that was possible as long we have only r300g.
now there is also r600g for which st/egl/kms also
requests a pipe named "radeon"
(possible solution in later commit)
core.h is the public header of core mesa. GLX, WGL, and GLSL are
supposed to include this header file. It should be noted that headers
included by core.h must not perform feature tests (#if FEATURE_xxx).
Otherwise, we cannot, for example, mix a FEATURE_ES2 libmesagallium.a
with a FEATURE_GL libglsl.a.
Because the static types talloc their names at dlopen time,
talloc_freeing the types at DRI driver screen teardown means that if
the screen gets brought back up again, the names will point at freed
memory. talloc_autofree_context() exists to do just what we want
here: Free memory referenced across the program's lifetime so that we
avoid noise in memory leak checkers.
Fixes: bug #29722 (assertion failure in unigine).
This prevents assertion failures or cascading errors after we've
logged the fact that we were unable to handle the initializer.
Fixes unsized-array-non-const-index-2.vert
The previous any() implementation would generate arg0.x || arg0.y ||
arg0.z. Having an expression operation for this makes it easy for the
backend to generate something easier (DPn + SNE for 915 FS, .any
predication on 965 VS)
We need to always at least export one component (wether it's depth
or color. Add valid r7xx shader program for depth decompression.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
We ended up returning CONST[loc] rather than TEMP[loc2]. Things would
*usually* end up working out OK, since the constants often ended up
getting allocated to CONST[loc..loc+columns] with no swizzle. But for
the case where the contigous temporary copy of the swizzled constant
vec4 args was actually needed, we'd end up reading some other constant
values, possibly including ones not actually allocated.
Fixes: glsl-varying-mat3x2.
Previously glcpp would silently abort if it couldn't fstat the file being
read, (so it would work with stdin redirected from a file, but would not
work with stdin as a tty). The stat was so that glcpp could allocate
a buffer for the file content in a single call.
We now use talloc_realloc instead, (even if the fstat is
possible). This is theoretically less efficient, but quite irrelevant,
(particularly because the standalone preprocessor is used only for
testing).
We recently added several tests that intentionally trigger
preprocessor errors. During valgrind-based testing, our test script
was noticing the non-zero return value from the preprocessor and
incorrectly flagging the valgrind-based test as failing.
To fix this, we make valgrind return an error code that is otherwise
unused by the preprocessor.
This error message was missing so that the program would simply
segfault if the provided filename could not be opened for some reason.
While we're at it, we add explicit support for a filename of "-" to
indicate input from stdin.
The existing DECIMAL_INTEGER pattern is the correct thing to use when
looking for a C decimal integer, (that is, a digit-sequence not
starting with 0 which would instead be an octal integer).
But for #line, we really want to accept any digit sequence, (including
"0"), and always interpret it as a decimal constant. So we add a new
DIGITS pattern for this case.
This should fix the compilation failure noted in bug #28138https://bugs.freedesktop.org/show_bug.cgi?id=28138
(Though the generated file will not be updated until the next commit.)
Previously, the YY_USER_ACTION was overwriting the yylloc->source value
in every action, (after that value had been carefully set by the handling
of the #line directive). Instead, we want to initialize it once in
YY_USER_INIT and then not touch it at all in YY_USER_ACTION.
This test exposes two current bugs:
1. The source number is not being correctly emitted in error
messages (instead, it's always 0).
2. A directive of "#line 0" is resulting in the following
parse error:
preprocessor error: Invalid tokens after #
The README file had grown a little bit stale. We've been using newer
versions of both the GLSL and C99 specifications, so list those. Also,
several of the documented known limitations have since been fixed, so
remove those.
This directive is already implemented nicely, but wasn't previously tested.
It will be convenient to use this directive in further tests that rely
on error messages, (such as ensuring that #line correctly sets the line
number in the error message).
util_fill_rect could not handle formats with more than 32 bits,
since the fill color was a uint32_t value. Fix this by using
a util_color union instead, and also expand the union so it
works with formats which have up to 256 bits (the max of any
format currently defined).
Gallium always puts gl_PointCoord in GENERIC[0] if
point_quad_rasterization is enabled.
This is silly, but for now it makes mesa-demos/glsl/pointcoord work.
Before, we were discarding the compiled vertex program on each
vertex program change.
Now we compile the program as if there were 6 clip planes and
dynamically patch in an "end program" bit at the right place.
Also, nv30 should now work.
the current code reuses the same vbo over and over, however after a flush
we'd stall and wait for mapping on the vbo when we should just fire and forget.
On a gears test this brings me from ~620 to ~750 on my rv530 in swtcl mode.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the same issue as in the previous patch, but here the Blit is not
implemented for separate depth and stencil buffers at all (such
a configuration is not supported in Gallium) and the code incorrectly treated
a D24S8 texture as two separate buffers, making this Blit a no-op.
Signed-off-by: Brian Paul <brianp@vmware.com>
This should be mostly a noop, except that a plain dereference of a
variable that is not part of a constant expression could now get
"constant folded". I expect that for all current backends this will
be either a noop, or possibly a win when it provokes more
ir_algebraic. It'll also ensure that when new features are added,
tree walking will work normally. Before this, constants weren't
getting folded inside of loops.
I had used pkg-config from the Makefile because I didn't want to screw
around with the non-autoconf build, but that doesn't work because the
PKG_CONFIG_PATH or TALLOC_LIBS/TALLOC_CFLAGS that people set at
configure time needs to be respected and may not be present at build
time.
Bug #29585
Now we lie less when claiming OpenGL 2 support.
Also, first piglit result group is now all green, except for
fdo25614-genmipmap, which seems mesa/st's fault.
Include draw_context.h for draw_*_vertex_shader symbols.
Fixes the following GCC warning.
nvfx_vertprog.c: In function 'nvfx_vp_state_create':
nvfx_vertprog.c:1276: warning: implicit declaration of function 'draw_create_vertex_shader'
nvfx_vertprog.c:1276: warning: assignment makes pointer from integer without a cast
nvfx_vertprog.c: In function 'nvfx_vp_state_delete':
nvfx_vertprog.c:1298: warning: implicit declaration of function 'draw_delete_vertex_shader'
Before using depth buffer as texture, it needs to be decompressed
(tile pattern of db are different from one used for colorbuffer
like texture)
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Introduce a new configuration option XMESA_STRICT_INVALIDATE to switch
between swapbuffers-based and glViewport-based buffer invalidation.
Default strict invalidate to false, ie glViewport-based invalidation,
aka ST_MANAGER_BROKEN_INVALIDATE.
This means we will not call XGetGeometry after every swapbuffers,
which allows swapbuffers to remain asynchronous. For apps running at
100fps with synchronous swapping, a 10% boost is typical. For gears,
I see closer to 20% speedup.
Note that the work of copying data on swapbuffers doesn't disappear -
this change just allows the X server to execute the PutImage
asynchronously without us effectively blocked until its completion.
This applies even to llvmpipe's threaded rasterization as the
swapbuffers operation was a large part of the serial component of an
llvmpipe frame.
The downside of this is correctness - applications which don't call
glViewport on window resizes will get incorrect rendering, unless
XMESA_STRICT_INVALIDATE is set.
The ultimate solution would be to have per-frame but asynchronous
invalidation. Xcb almost looks as if it could provide this, but the
API doesn't seem to quite be there.
This version should hopefully be much clearer and thus less likely
to be subtly broken.
Also fixes point sprites on nv40 and possibly some other bugs too.
My merge of Zhenyu's patch on top of my previous patches broke it by
my code expecting simd16 single write and Zhenyu's simd8 path being
disabled by mine. Merge the two for success.
The docs claim two conflicting things: One, that a scalar source is
supported. Two, source hstride must be 1 and width must be exec size.
So splat a constant argument out into a full reg to operate on, since
violating the second set of constraints is clearly failing.
The alternative here might be to do a 1-wide exec on a constant
argument for math1. It would probably save cycles too. But I'll
leave that for the glsl2-965 branch.
Fixes glsl-algebraic-div-one-2.shader_test.
dump_cpu is used only when DEBUG is defined.
Fixes the following GCC warning on builds without DEBUG defined.
util/u_cpu_detect.c:76: warning: 'debug_get_option_dump_cpu' defined but not used
Silence the following i686-apple-darwin10-gcc-4.2.1 warnings.
nv04_2d.c: In function 'nv04_region_copy_cpu':
nv04_2d.c:560: warning: 'dswy' may be used uninitialized in this function
nv04_2d.c:559: warning: 'dswx' may be used uninitialized in this function
nv04_2d.c:562: warning: 'sswy' may be used uninitialized in this function
nv04_2d.c:561: warning: 'sswx' may be used uninitialized in this function
i686-apple-darwin10-gcc-4.2.1 generated the following warning.
warning: 'score' may be used uninitialized in this function
GCC 4.4.3 on Linux didn't generate the above warning.
Partialy fix texturing from depth buffer, depth buffer is tiled
following different tile organisation that color buffer. This
properly set the tile type & array mode field of texture sampler
when sampling from db resource.
Add initial support to untiling buffer when transfering them,
it's kind of broken by corruption the vertex buffer of previous
draw.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is valid input, and asserting here does causes the test suites that
verify this to crash.
Also, the assert was wrongly accepting the case
max_index == vert_info->count
which, IIUC, is the first vertex outside the buffer. Assuming the
vert_info->count is precise (which often is not the case).
The 'vec4[12] foo' style already worked, but the 'vec4 foo[12]' style
did not. Also, 'vec4[] foo' was wrongly accepted.
Fixes piglit test cases array-19.vert and array-21.vert.
May fix fd.o bug #29684 (or at least part of it).
This is a full rewrite of the drawing and buffer management logic.
It offers a lot of improvements:
1. A copy of buffers is now always kept in system memory. This is
necessary to allow software processing of them, which is necessary
or improves performance in many cases.
2. Support for pushing vertices on the FIFO, with index lookup if necessary.
3. "Smart" draw code that tries to intelligently choose the cheapest
way to draw something: whether to use inline vertices or hardware
vertex buffer, and whether to use hardware index buffers
4. Support for all vertex formats supported by the hardware
5. Usage of translate to push vertices, supporting all formats that are
sensible to use as vertex formats
6. Support for base vertex
7. Usage of Ben Skeggs' primitive splitter originally for nv50, allowing
correct splitting of line loops, triangle fans, etc.
8. Support for instancing
9. Precomputation using the vertex elements CSO
Thanks to Ben Skeggs for his primitive splitter originally for nv50.
Thanks to Christoph Bumiller for his nv50 push code, that was the basis
of this work, even though I changed his code dramatically, in particular
to replace his ad-hoc vertex data emitter with translate.
The changes could also go into nv50 too, but there are substantial
differences due to the additional nv50 hardware features.
This is a significant refactoring of the sampling code that:
- Moves all generic functions in nvfx_fragtex.c
- Adds a driver-specific sampler view structure and uses it to
precompute texture setup as it should be done
- Unifies a bit more of code between nv30 and nv40
- Adds support for sampler view swizzles
- Support for specifying as sampler view format different from the
resource one (only trivially)
- Support for sampler view specification of first and last level
- Support for depth textures on nv30, both for reading depth and
for compare
- Support for sRGB textures
- Unifies the format table between nv30 and nv40
- Expands the format table to include essentially all supportable formats
except mixed sign and "autonormal" formats
- Fixes the "is format supported" logic, which was quite broken, and
makes it use the format table
Only tested on nv30 currently.
Seems a reasonable threshold for now.
Significantly speeds up Piglit's 1x1 glReadPixels (but, you know,
reading pixels in 1x1 blocks is NOT a good idea, especially if you
might be running on a less-than-perfect driver).
This patch adds support for creating temporary surfaces to allow
rendering to surfaces that cannot be rendered to.
It uses the _second_ version of the render temporary infrastructure.
This is necessary for swizzled 3D textures and small mipmaps of
swizzled 2D textures.
This version of the patch creates a resource to use as a temporary
instead of a raw BO, making the code simpler.
Now that the new 2D code is in place, swizzling can be safely enabled.
Render temporaries are needed in some cases, so this may degrade nv30
a bit until it gets render temporaries too.
This patch implements nv04_surface_copy/fill using the new 2D engine module.
It supports falling back to the 3D engine using the u_blitter module, which will be
added in a later patch.
Also adds support for using the 3D engine, reusing the u_blitter module
created for r300.
This is used for unswizzling and copies between swizzled surfaces.
This patch add a brand new nv04-nv40 2D engine module.
It should correctly implement all operations involving swizzled, and 3D-swizzled surfaces.
This code is independent from the Gallium framework and can thus be reused in the DDX and classic Mesa drivers (it's only likely to be useful in the latter, though).
Currently, surface_copy and surface_fill are broken for 3D textures, for swizzled source textures and possibly for some misaligned cases
The code is based around the new nv04_region structure, which encapsulates the information from pipe_surface needed for the 2D engine and CPU copies.
The use of nv04_region makes the code independent of the Gallium framework and allows to transform the nv04_region without clobbering the nv04_region.
The existing M2MF, blitter, and SWIZZLED_SURFACE paths have been improved and a new CPU path has been added.
There is also support to tell the caller to use the 3D engine.
The main feature of the copy/fill setup algorithm is linearization/contiguous-linearization of swizzled surfaces.
The idea of linearization is that some swizzled surfaces are laid out like linear ones (1xN, 2xN, Nx1) and can thus be used as such (e.g. useful for copying single pixels).
Also, some rectangles (e.g. the whole surface) are contiguous in memory. If both the source and destination rectangles are swizzled but contiguous, then they can be regarded as both linear: this is the idea of "contiguous linearization".
This, for instance, allows to use the 2D engine to duplicate the content of a swizzled surface to another swizzled surface, by pretending they are actually linear.
After linearization, the result may not be 64-byte aligned. Another transformation is done to enlarge the linear surface so that it becomes 64-byte aligned.
This is also used to 64-byte align swizzled texture mipmaps.
The inner loop of the CPU path is as optimized as possible without using SSE/SSE2.
Future improvements could include SSE/SSE2 support, and possibly a faster coordinate swizzling algorithm (which is however not used in the inner loop).
It may be a good idea to autogenerate swizzling code at least for all possible POT 2D texture dimensions (less than 256), maybe for all 3D ones too (less than 4096).
Also, it woud be a very good idea to make a copy with the GPU first if the source surface is in uncached memory.
This greatly simplifies the code, and avoids ad-hoc copy code.
Also, these new transfers work for buffers too, even though they
are still used for miptrees only.
Changes:
- Disable swizzling on non-RGBA 2D textures, since the current 2D
code is mostly broken in those cases. A later patch will fix this.
Thanks to Andrew Randrianasulu who reported this.
- Fix compressed texture transfers and hack around the current 2D
code inability to copy compressed textures by using direct access.
Thanks to Andrew Randrianasulu who reported this.
This patch rewrites all the miptree layout and transfer code in the
nvfx driver.
The current code is broken in several ways:
1. 3D textures are laid out first by face, then by level, which is
incorrect
2. Cube maps should have 128-byte aligned faces
3. Swizzled textures have a strange alignment test that seems
unnecessary
4. We store the image_offsets for each face/slice but they can be
easily computed instead
5. "Swizzling" is not supported for compressed formats. They can be
"swizzled" but swizzling only means that there are no gaps (pitch is
level-dependant) and the layout is still linear
6. Swizzling is not supported for non-RGBA formats. All formats (except
possibly depth) can be swizzled according to my testing.
The miptree layout is rewritten based on my empirical testing, which I
posted in the "miptree findings" mail.
The image_offset array is removed, since it can be calculated with a
simple multiplication; the only array in the miptree structure is now
the one for mipmap level starts, which it seems cannot be easily
computed in constant time.
Also, we now directly store a nouveau_bo instead of a pipe_buffer in
the miptree structure, like nv50 does.
Support for render temporaries is removed, and will be readded in a
later patch.
Note that the current temporary code is broken, because it does not
copy the temporary back on render cache flushes.
The trace driver's implementation of sampler_view_destroy was calling
directly into the underlying pipe's sampler_view_destroy implementation.
This causes problems for pipes that keep references to sampler views
even after the state tracker has released them. Instead, we'll simply
drop the trace driver's reference to the pipe's sampler view.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
The trace driver was tracing the unwrapped version of the index buffer
when setting the index buffer. This caused an assert validating that
a resource belonged to the trace driver to fail. Instead, we'll log
the unmodified index buffer structure when setting the index buffer.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
That is, replace the old _glapi_* names by new names that start with
u_current_. When MAPI_GLAPI_CURRENT is defined, u_current.h defines
rename macros to restore the old names. That is done for ABI
compatibility.
glapi defines an interface that is used by DRI drivers. It must not be
changed in an ABI incompatible way. This commit moves all
functions/variables belong to the interface to glapi.h. Instead of
including u_current.h from glapi.h, u_current.h now includes glapi.h.
Only 8 out of the up to 13 regs are for source/dest depth, so the name
wasn't particularly appropriate. Note that this doesn't count the
constant or URB payload regs. Also, don't pre-divide by 2, so it's
actually a number of registers.
This might technically not always be correct, because va_copy might
be a function, or a system might not have va_copy, and not work with
assignment.
Hopefully this is never the case.
Without configure tests, it doesn't seem possible to do better.
This function was apparently missing from the display list dispatch
table, causing the generic no-op function to be called instead. To make
matters worse, the no-op function is indistinguishable from a successful
call to GetUniformLocation. GL specifies that GetUniformLocation is
executed immediately when compiling display lists.
Fixes fdo bug 29622.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
This commit adds the ability to produce a log file containing all
reference count changes, and object creation/destruction, on Gallium
objects.
The data allows to answer these crucial questions:
1. This app is exhausting all my memory due to a resource leak: where
is the bug?
2. Which resources is this app using at a given moment? Which parts of
the code created them?
3. What kinds of resources does this app use?
4. How fast does this app create and destroy resources? Which parts of
the code create resources fast?
The output is compatible with the one produced by the similar facility
in Mozilla Firefox, allowing to use Mozilla's tools to analyze the data.
To get the log file:
export GALLIUM_REFCNT_LOG=<file>
To get function names and source lines in the log file:
tools/addr2line.sh <file>
To process the log file, see:
http://www.mozilla.org/performance/refcnt-balancer.html
These sources compile to nothing when FEATURE_ES is not defined and thus
were overlooked. Note that api_exec_es[12].c are still missing on the
list. They should be added when they can be generated on the fly.
The API of the context was not checked against EGL_RENDERABLE_TYPE when there
was no attribute list. Move the check to _eglInitContext, and be verbose about
common mistakes (EGL_RENDERABLE_TYPE not set, EGL_CONTEXT_CLIENT_VERSION not
set, or eglBindAPI not called).
Currently Gallium internals always use PIPE_TEXTURE_2D and normalized
coordinates to access textures.
However, PIPE_TEXTURE_2D is not always supported for NPOT textures,
and PIPE_TEXTURE_RECT requires unnormalized coordinates.
Hence, this change adds support for both kinds of normalization.
Currently Gallium internals always use PIPE_TEXTURE_2D and normalized
coordinates to access textures.
However, PIPE_TEXTURE_2D is not always supported for NPOT textures,
and PIPE_TEXTURE_RECT requires unnormalized coordinates.
Hence, this change adds support for both kinds of normalization.
The code would attempt to add a new signature to the ir_function, which
didn't exist. Simply bailing out/returning early seems reasonable.
Fixes piglit test redeclaration-02.vert, and fixes a crash in
redeclaration-03.vert (the test still fails).
This pretty much ports the code from r600c, however it doesn't
always seem to work quite perfectly, but I can't find anything in this
code that is wrong. I'm guessing either literal input or constants
aren't working always.
Fixes glsl-vs-if-nested (70.0 is not <= 70.000648 thanks to the
swizzle bits getting set). Some safety checks are added to make sure
this doesn't happen again as we increase the usage of immediate values
in program generation.
There is no explicit predefined macro to distinguish between OpenSolaris
and Solaris. This patch assumes that the difference is in the compilers.
OpenSolaris uses GCC and not the Sun Studio compiler. Assume that the
availability of fpclassify is due to GCC.
This patch was not tested on Solaris. It would break the build on
Solaris with GCC if GCC on Solaris does not have fpclassify.
This should make it easier to diff the output, clean up some of the
insane whitespace, and make the strings a bit smaller.
We'll probably need to split up the prototype strings eventually, but
for now, this gets it under the 65K mark.
Calls to equal(bvec, bvec) or notEqual(bvec, bvec) previously caused an
assertion. Fixes piglit tests glsl-const-builtin-equal-bool and
glsl-const-builtin-notEqual-bool.
Carefully avoiding printing any error when the new definition matches
the existing definition.
This fixes the recently-added 088-redefine-macro-legitimate.c and
089-redefine-macro-error.c tests as well as glsparsertest/preprocess1
in piglit.
The specification says that redefining a macro is an error, unless the
new definitions is identical to the old one, (identical replacement
lists but ignoring differing amounts of whitespace).
In commit 6be3a8b70a, the #version directive
was fixed to stop generating a spurious newline. Here we simply update
the expected result for the single test which includes a #version directive.
Matching the newline here meant having to do some redundant work here,
(incrementing line number, resetting column number, and returning a
NEWLINE token), that could otherwise simply be left to the existing rule
which matches a newline.
Worse, when the comment rule matches the newline as well, the parser
can lookahead and see a token for something that should actually be skipped.
For example, in a case like this:
#if 0 // comment here
fail
#else
win
#endif
Both fail and win appear in the output, (not that the condition is being
evaluated incorrectly---merely that one token after the comment's newline
was being lexed/parse regardless of the condition).
This commit fixes the above test case, (which is also remarkably similar
to 087-if-comments which now passes).
I didn't expect that this would really work, but it turns out there
are shaders in the wild that do it.
Fixes: (with swrast)
glsl-fs-main-return
glsl-vs-main-return
These binops are the vector-to-bool comparisons, not vec-to-bvec. We
likely want both operations avilable as expression, since 915 and 965
FS naturally does the vector version, while 965 VS can also naturally
do the scalar version. However, we can save that until later.
Fixes:
glsl-fs-vec4-operator-equal.shader_test
glsl-fs-vec4-operator-notequal.shader_test
glsl-vs-vec4-operator-equal.shader_test
glsl-vs-vec4-operator-notequal.shader_test
Now that we have glsl2 with if flattening in place, most shaders will
just work. Remaining failing shaders will mostly be due to loop
unrolling (in progress), some possible if flattening failures in
inlining functions (planning on fixing), and the register/instruction
count limits.
While the GLSL and GLSL-ES specs say that shaders shouldn't fail to
compile/link due to register/instruction limits, in practice we're not
the first vendor to expose GLSL on hardware with these limitations.
The benefit to application developers of providing a better language
for GPU programming is greater than the pain of having to handle
instruction limits (which they had to for ARB_fp on this hardware
anyway)
This error led to an assertion failure for some constructors of
non-square matrices. It only occured in matrices where the number of
columns was greater than the number of rows. It didn't even always
occur on those.
Fixes piglit glslparsertest case constructor-16.vert.
When simplifying (vec4(1.0) / (float(x))) to rcp(float(x)), we forgot
to produce a vec4, angering ir_validate when starting alien-arena.
Fixes:
glsl-algebraic-add-zero-2
glsl-algebraic-div-one-2
glsl-algebraic-mul-one-2
glsl-algebraic-sub-zero-3
glsl-algebraic-rcp-sqrt-2
Several optimization paths, including constant folding, can lead to
accessing an ir_constant array with an out of bounds index. The GLSL
spec lets us produce "undefined" results, but it does not let us
crash.
Fixes piglit test case glsl-array-bounds-01 and glsl-array-bounds-03.
Apart from the fact that the radeon.h/r600_states.h editing is a nightmare, this
wasn't so bad.
passes piglit user-clip test now also trivial tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This saves an extra message reg move in the program, though I'm not
clear on whether it will have any performance impact other than cache
footprint. It will also fix those math calls on Sandybridge, where
the brw_eu_emit.c brw_math() support relies on the implied move being
used.
check_os_katmai_support checks that the operating system running on a
SSE-capable processor supports SSE. This is necessary for unpatched
2.2.x and earlier kernels. 2.4.x and later kernels support SSE.
check_os_katmai_support will disable SSE capabilities for 32-bit x86
operating systems for which there is no code path. Currently, this
function handles Linux, Windows, and several BSDs. Mac OS, Cygwin, and
Solaris are several operating systems with no code paths.
Rather than add code for the unhandled operating systems, remove this
function altogether. This will fix SSE detection on all recent 32-bit
x86 operating systems. This completely breaks functionality on unpatched
2.2.x and earlier kernels, although there are likely no Gallium3D users
on such operating systems.
Changes in v2:
- Change function name
Currently draw_llvm refuses to create itself on non-SSE2 CPUs due to
an alleged LLVM bug.
However, this is implemented improperly, because other parts of draw
still attempt to access draw->llvm, resulting in segfaults.
Instead, put the check in debug_get_option_draw_use_llvm, check that
before calling draw_llvm_create, and then check whether draw->llvm is
non-null everywhere else.
NOTE: Win64 is untested, and is thus currently disabled.
If you have such a system, please enable it and report whether it works.
To enable it, change src/gallium/auxiliary/translate/translate.c
Changes in v5:
- On Win64, preserve %xmm6 and %xmm7 as required by the ABI
- Use _WIN64 instead of WIN64
Changes in v4:
- Use x86_target() and x86_target_caps()
- Enable translate_sse in x86-64, but not in Win64
Changes in v3:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs
Changes in v2:
- Minimize #ifs
- Give a name to magic number CHANNELS_0001
- Add support for CPUs without SSE (only memcpy and swizzles, like non SSE2)
- Fixed comments
translate_sse is currently very limited to the point of
being useless in essentially all cases.
In particular, it only support some float32 and unorm8
formats and doesn't work on x86-64.
This commit rewrites it to support:
1. Dumb memory copy for any pair of identical formats
2. All formats that are swizzles of each other
3. Converting 32/64-bit floats and all 8/16/32-bit integers to 32-bit float
4. Converting unorm8/snorm8 to snorm16 and uscaled8/sscaled8 to sscaled16
5. Support for x86-64 (doesn't take advantage of it in any way though)
This new translate can even be useful to translate index buffers for
cards that lack 8-bit index support.
It passes the testsuite I wrote, but note that this is a major change, and more
testing would be great.
Changes in v5:
- Add sse2_movdqa
Changes in v4:
- Use _WIN64 instead of WIN64
Changes in v3:
- Add target and target caps functions, so that they could be different in
principle from the current CPU and they don't need #ifs to check
Changes in v2:
- Win64 support (untested)
- Use u_cpu_detect.h constants instead of #ifs
This commit adds minimal x86-64 support: only movs between registers
are supported for r8-r15, and x64_rexw() must be used to ask for 64-bit
operations.
It also adds several new instructions for the new translate_sse code.
movdqa
Currently translate_sse puts two trivial wrappers in the translate vtable.
These slow it down and enlarge the source code for no gain, except perhaps
the ability to set a breakpoint there, so remove them.
Breakpoints can be set on the caller of the translate functions, with no
loss of functionality.
Changes in v3:
- If we can do a copy, don't try to get an emit func, as that can assert(0)
Changes in v2:
- Add comment regarding copy_size
When used in GPU drivers, translate can be used to simultaneously
perform a gather operation, and convert away from unsupported formats.
In this use case, input and output formats will often be identical: clearly
it would make sense to use a memcpy in this case.
Instead, translate will insist to convert to and from 32-bit floating point
numbers.
This is not only extremely expensive, but it also loses precision for
32/64-bit integers and 64-bit floating point numbers.
This patch changes translate_generic to just use memcpy if the formats are
identical, non-blocked, and with an integral number of bytes per pixel (note
that all sensible vertex formats are like this).
Triangle strip alternates the front/back orientation of its triangles.
max_vertices was made even so that varray never splitted a triangle
strip at the wrong positions.
It did not work with triangle strips with adjacencies. And it is no
longer relevant with vsplit.
vcache decomposes primitives while vsplit splits primitives. Splitting
is generally easier to do and is faster. More importantly, vcache
depends on flatshade_first to decompose. The outputs may have incorrect
vertex order which is significant to GS.
vsplit is based on varray. It sets the split flags when a primitive is
splitted. It also has support for indexed primitives.
For indexed primitives, unlike vcache, vsplit splits the primitives
instead of decomposes them.
A primitive may be splitted in frontends. The splitted primitives
should convey certain flag bits so that the decomposer can correctly
decide the stipple or edge flags.
This commit adds flags to draw_prim_info and updates the decomposer to
honor the flags. Frontends and middle ends will be updated later.
'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)
Fixes the build when the features are disabled and the vfuncs
don't exist.
'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)
Fixes the build when the features are disabled and the vfuncs
don't exist.
'struct dd_function_table' only conditionally contains
the function pointer NewFramebuffer and friends based on
FEATURE_EXT_framebuffer_* defines. (See src/mesa/main/dd.h)
Fixes the build when the features are disabled and the vfuncs
don't exist.
Remove slang_compile.h.
Include glheader.h for GL symbols.
Include slang_compile_function.h for slang_function_scope symbol.
Include slang_compile_struct.h for slang_struct_scope symbol.
Include slang_compile_variable.h for slang_variable_scope symbol.
Include slang_typeinfo.h for slang_type_specifier symbol.
Include slang_utility.h for slang_atom_pool symbol.
This changes r300_destroy_context() so it can be called
on a partially-initialized context, and uses it when
r300_create_context() hits a fatal error.
This makes sure r300_create_context() doesn't leak memory
or neglect to call r300_update_num_contexts() when it fails.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This fixes a potential bug if (has_hyperz) is false
(it would still init the atom as if has_hyperz were true).
Signed-off-by: Marek Olšák <maraeo@gmail.com>
For 16 and 64 pixel levels, calculate a mask which is linear in x and
y (ie not in the swizzle layout).
When iterating over full and partial masks, figure out position by
manipulating the bit number set in the mask, rather than relying on
postion arrays.
Similarly, calculate the lower-level c values from dcdx, dcdy and the
position rather than relying on the step array.
Remove imports.h.
Remove slang_typeinfo.h.
Remove slang_compile_operation.h.
Include glheader.h for GL symbols.
Include slang_utility.h for slang_atom_pool symbol.
Include glheader.h for GL symbols.
Include slang_compile_function.h for slang_function symbol.
Include slang_compile_operation.h for slang_operation symbol.
Include slang_compile_variable.h for slang_variable and slang_variable_scope symbols.
Include slang_typeinfo.h for slang_type_qualifer and slang_fully_specified_type symbols.
Include glheader.h for GL symbols.
Include slang_compile_function.h for slang_function symbol.
Include slang_compile_operation.h for slang_operation symbol.
Include slang_compile_variable.h for slang_variable and
slang_variable_scope symbols.
Include slang_log.h for slang_info_log symbols.
Include slang_utility.h for slang_atom and slang_atom_pool symbols.
Include glheader.h for GL symbols.
Include slang_compile.h for slang_name_space symbol.
Include slang_compile_function.h for slang_function symbol.
Include slang_compile_operation.h for slang_operation symbol.
Include slang_log.h for slang_info_log symbol.
Include slang_utility.h for slang_atom_pool symbol.
This commit silences the printing off most of the debug information
when running debug builds. The big culprits are: the tgsi sanity checker
that gets run on all shaders on debug; all the options; and
finaly the cpu caps printer.
Remove mtypes.h.
Include glheader.h for GL symbols.
Include slang_compile_variable.h for slang_variable symbol.
Include slang_typeinfo.h for slang_type_specifier symbol.
Include slang_utility.h for slang_atom_pool symbol.
This copies over a dummy builtin_functions.cpp and rebuilds a
bootstrapped version of the compiler, then uses that to generate the
proper list of builtins. Finally, it rebuilds the compiler with the new
list.
Unfortunately, it's no longer automatic, but at least it works.
Each language version/extension and target now has a "profile" containing
all of the available builtin function prototypes. These are written in
GLSL, and come directly out of the GLSL spec (except for expanding genType).
A new builtins/ir/ folder contains the hand-written IR for each builtin,
regardless of what version includes it. Only those definitions that have
prototypes in the profile will be included.
The autogenerated IR for texture builtins is no longer written to disk,
so there's no longer any confusion as to what's hand-written or
generated.
All scripts are now in python instead of perl.
With the glsl2-965 branch, the optimization of glsl-algebraic-rcp-rcp
regressed due to noop swizzles hiding information from ir_algebraic.
This cleans up those noop swizzles for us.
In C++ you don't have to say 'struct' or 'class' if the declaration of
the type has been seen. Some compilers will complain if you use
'struct' when 'class' should have been used and vice versa.
Fixes bugzilla #29539.
This is the patch from Benjamin's Aug 11, 2010 email with minor fixes
(such as moving declarations before code)
Signed-off-by: Brian Paul <brianp@vmware.com>
If gl_Vertex is not used in the shader, then attribute location 0 is
available for use.
Fixes piglit test case glsl-getattriblocation (bugzilla #29540).
This is based on a patch by nobled <nobled@dreamwidth.org> and allows the TFP
extension to be enabled for DRISW also. This patch does not enable TFP for DRISW
though, because testing on xephyr segfaults here (for both classic and gallium):
Program received signal SIGSEGV, Segmentation fault.
0x00786a4a in _mesa_GenTextures (n=1, textures=0xbfffee4c) at main/texobj.c:788
788 ASSERT_OUTSIDE_BEGIN_END(ctx);
(gdb)
(gdb) where
\#0 0x00786a4a in _mesa_GenTextures (n=1, textures=0xbfffee4c) at main/texobj.c:788
\#1 0x0817a004 in __glXDisp_GenTextures ()
\#2 0x08168498 in __glXDispatch ()
\#3 0x0808b6ce in Dispatch ()
\#4 0x08084435 in main ()
The TFP code is generic except for the teximage call. We need to verify that
DRISW correclty implements whatever hook teximage finally calls.
MAT2 and MAT2X2, for example, are treated identically by the parser.
The language version based error checking (becuase mat2x2 is not
available in GLSL 1.10) is already done in the lexer.
1. Generate random data specifically for float and doubles, so that
they end up in [0, 1] range
2. Don't test useless conversions like SCALED <-> NORM
3. Poison the buffers before testing
This takes the r300g texture format checker and fixes it up for r600g,
it passes glean texSwizzle, pixelformats, and texture_srgb tests,
however I think it L8S8_SRGB is broken as is L8_SRGB, need to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Remove --never-interactive because it is already specified in the
source using %option. Use -o instead of --outfile. Some of the
%option commands may also need to be removed for compatibility with
older versions (e.g., 2.5.4) of flex.
This should fix bugzilla #29209.
Without this, the parser will generate obtuse, useless error
diagnostics when reservered word that are not used by the grammar are
encountered in a shader.
Fixes bugzilla #29519.
Accidentally having a variable called 'sig' within an if-statement
cause the higher scope 'sig' to always be NULL. As a result a new
function signature was created for a function definition even when one
already existed from a prototype declaration.
Fixes piglit test case glsl-function-prototype (bugzilla #29520).
This was previously being appended to the output string *after* a copy
of the supposedly final string was made and handed to the caller. So
the diagnostic was never actually visible to the user.
We fix this by moving the check for an unterminated #if from
glcpp_parser_destroy to the calling function, preprocess.
This fixes the test case 083-unterminated-if.c.
This is more clear than the previously-generated diagnostic which was
something confusing like "enexpected newline".
This change makse test 080-if-witout-expression.c now pass.
Rather than telling the user what to fix, the standard convention is to
describe what the detected problem is. With this change, test
081-elif-without-expression now passes.
Which are proving to be useful since some of these tests are not yet
acting as desired, (in particular, the unterminated if test is not
generating any diagnostic).
1. Fix the segfault due to the reverted commit using the new interface
2. Reindent to Mesa 3 spaces style
3. Improve output and return success/failure with error code
4. Add much better support for testing translate_sse
Currently translate asserts on unsupported output formats, making
it impossible to use for some purposes, such as testing whether it
actually works on all formats it supports.
Removing the assert was met with opposition, so this change allows
clients to ask whether an output format is supported, and they are thus
able to avoid attempting to use it.
Since this is just an addition to the API, no adverse effect is
possible, and it makes the testsuite work again.
This adds a couple of test cases to expand our coverage of invalid #if and
being skipped, (either by being nested inside an #if/#elif that evaluates to
zero or by being after an #if/#elif that evaluates to non-zero).
This reverts commit 16b45ca7ce.
José Fonseca asked for a revert.
Note that the testsuite will now segfault since it attempts to test
all possible formats.
util_framebuffer_copy was attempting to copy all elements of the
source framebuffer state.
However, this breaks if the user does not zero initialize the structure.
Instead, only copy the elements up to nr_cbufs, and clear elements up
to dst->nr_cbufs, if the destination was larger than the source.
Direct3D 10/11 has no concept of transfers. Applications instead
create resources with a STAGING or DYNAMIC usage, copy between them
and the real resource and use Map to map the STAGING/DYNAMIC resource.
This util module allows to implement Gallium drivers as a Direct3D
driver would be implemented: transfers allocate a resource with
PIPE_USAGE_STAGING, and copy the data between it and the real resource
with resource_copy_region.
This is a simple framework that handles splitting primitives in an
abstract way.
The user has to specify the primitive start, start index and count.
Then, it can ask the primitive splitter to "draw" a chunk of the
primitive, staying under a given vertex/index budget.
The primitive splitter will then call user-supplied functions to
emit a range of vertices/indices, as well as switch the edgeflag
on or off.
This is particularly useful for hardware that either has limits
on the vertex count field, or where vertices are pushed on a FIFO
or temporary buffer of limited size.
Note that unlike other splitters, it does not manipulate data in
any way, and merely asks a callback to do so, in vertex intervals.
We're already using the return-value of cmp to print either PASS or
FAIL and in the case of failure, we're subsequently running and
showing the output of diff. So any warnings/errors from cmp itself are
not actually needed, and can be quite confusing.
Commit d4a04f3155 caused this test case
to produce an additional blank line, which is otherwise harmless, but
does need to be reflected in the .expected file for the test to pass.
Since we have a custom structure for YYLTYPE locations, we need to use
an %initial-action directive to avoid triggering use of uninitialized
memory when, for example, printing error messages.
We apparently don't yet have a test case that allowed valgrind to find
this bug for us, but valgrind found a similar problem in the other
parser, so we fix this one as well.
Since we have a custom structure for YYLTYPE locations, we need to use
an %initial-action directive to avoid triggering use of uninitialized
memory when, for example, printing error messages.
Thanks to valgrind for noticing this bug.
It's rather easy to produce two constant multiplies separated by other
multiplies while writing a BRDF shader, and non-obvious enough in the
resulting codegen that I didn't catch it in my demo code until just
recently. Cuts 3 965 instructions from my demo (<1%), and 20 from
glsl-fs-raytrace (1.3%).
All the current HW backends transform subtract to adding the negation,
so I haven't bothered peepholing it back out in Mesa IR. This allows
some subtract of subtract to get removed in ir_algebraic.
Whereas constant folding evaluates constant expressions at rvalue
nodes, constant propagation tracks constant components of vectors
across execution to replace (possibly swizzled) variable dereferences
with constant values, triggering possible constant folding or reduced
variable liveness.
This lets us track copies across basic block boundaries. The loop
doesn't get a filled out list of available copies into it yet, though.
glsl-fs-raytrace drops from 585 to 535 Mesa IR instructions out of the
compiler, and it appears that Yo Frankie's largest shaders decrease in
Mesa IR count by about 10% as well.
This can be supported on r600 without using the endian swapper, and is a
better fit for (typical) uploads using GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV
anyway.
Currently only ir_binop_equal and ir_binop_nequal are supported, but
soon all of the relational operators will be added. Making this
change now will simplify those commits.
This fixes the assert added in LLVM 2.8:
assert(getType()->isIntOrIntVectorTy() &&
"Tried to create an integer operation on a non-integer type!")
But it also fixes some subtle bugs, since we should've been doing this
since LLVM 2.6 anyway.
Includes a modified patch from steckdenis@yahoo.fr for the
FNeg instructions in emit_fetch(); thanks for pointing those out.
http://bugs.freedesktop.org/29404http://bugs.freedesktop.org/29407
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In this case, we were incorrectly prioritizing PIPE_TRANSFER_DONTBLOCK over
PIPE_TRANSFER_UNSYNCHRONIZED.
This can lead to failure in the Mesa VBO draw paths that end up specifying
both, but don't expect map to fail (in particular, the problem manifested as
a leak of buffer objects in teapot with other changes).
This makes it compatible with the modified DRM interface in drm-radeon-testing.
Also, now you need to set RADEON_HYPERZ=1 to be able to use hyperz.
It's not bug-free yet.
Two integers were being operated on as
a vector of floats in draw_llvm_generate().
This bug got uncovered by fixing this bug:
http://bugs.freedesktop.org/29407
This is a follow-up patch to commit
f4511c4835.
Files that include tnl_dd/t_dd_dmatmp.h now need to also include
m_xform.h as t_context.h no longer includes it.
s_fragprog.h
Include mtype.h for GLcontext symbol.
Include s_span.h for SWspan symbol.
s_fragprog.c
Include s_context.h now that it is removed from s_fragprog.h.
s_atifragshader.h
Include mtypes.h for GLcontext symbol.
Include s_span.h for SWspan symbol.
s_atifragshader.c
Include s_context.h for SWcontext symbol.
- This can only be triggered when DEBUG_NOUVEAU_STATEOBJ is active.
- Also remove a redundant pointer assignment.
Reported-by: Roy Spliet <r.spliet@student.tudelft.nl>
Signed-off-by: Maarten Maathuis <madman2003@gmail.com>
This lets us handle arrays much better than trying to work backwards
from assembly.
Fixes fbo-drawbuffers-maxtargets on swrast (i965 needs loop unrolling)
Assert that "first" is always smaller than "count" and add reasoning.
It would be better to simply fix trim(), but it is used in tight loops
right now.
Do not expand LOCAL_VARS to void expression. Otherwise, declarations
and code will be mixed when more variables are declared in FUNC_ENTER.
This fixes fdo bug #29416.
Support for samplers in general is still incomplete -- anything in a
uniform struct will still be broken. But that doesn't appear to be
any different from master.
Fixes:
glsl-fs-uniform-sampler-array.shader_test
Previously, we'd replace an argument of mysampler[2] with a plain
reference to mysampler by using the cloning hash table. Instead, use
a visitor to clone whatever complicated sampler dereference into the
sampler parameter derefs in the inlined function body.
Use draw_decompose_tmp.h to replace vcache primitive decomposer. As the
new decomposer supports primitives with adjacency, vcache_triangle_adj
and vcache_line_adj (and their variants that have flags) are added.
Including draw_decompose_tmp.h defines a primitive decomposer. It is
intended to replace the existing vcache/so/gs/pipe decomposers.
This is based on draw_pt_vcache_tmp.h.
This fixes fbo-readpixels piglit test, and adds support for swapping
the formats. Not all formats are correct yet I don't think.
Signed-off-by: Dave Airlie <airlied@redhat.com>
st_program.h
Remove p_shader_tokens.h
Include st_context.h for st_context symbol.
Include p_state.h for PIPE_MAX_SHADER_INPUTS symbol.
Remove unnecessary forward declarations.
st_cb_bitmap.c
st_cb_clear.c
Include p_shader_tokens.h now that st_program.h doesn't include it.
Previously some sampler types were duplicated in GLSL 1.30 and
GL_EXT_texture_array. This resulted in not being able to find the
built-in sampler functions when the extension was used. When the
built-in functions were compiled, they bound to the 1.30 version.
This caused a type mismatch when trying to find the function. It also
resulted in a confusing error message:
0:0(0): error: no matching function for call to `texture2DArray(sampler2DArray, vec3)'
0:0(0): error: candidates are: vec4 texture2DArray(sampler2DArray, vec3)
0:0(0): error: vec4 texture2DArray(sampler2DArray, vec3, float)
Apparently they have always been broken, even before unification.
Fixes a lot of stuff, starting from morph3d and lighting in teapot
with textures disabled.
Include mtypes.h for GLcontext, gl_buffer_index, and GLframebuffer
symbols.
Include p_compiler.h for boolean symbol.
Include st_context.h in st_cb_eglimage.c as it previously included
st_context.h indirectly through st_manager.h.
Fixes ir_to_mesa handling of unop_log, which used the weird ARB_vp LOG
opcode that doesn't do what we want. This also lets the multiplication
coefficients in there get constant-folded, possibly.
Fixes:
glsl-fs-log
I'd missed putting in the actual "find structures to split" part, so
most of the code didn't do anything. I was running on too large of an
app and assuming the lack of progress was elsewhere.
This doesn't do anything if your structure goes through an uninlined
function call or if whole-structure assignment occurs. As such, the
impact is limited, at least until we do some global copy propagation
to reduce whole-structure assignment.
The HV doesn't descend into ir_variable, which is generally a good
thing (allowing one to distinguish between variable declarations and
refs), but here we never saw tree grafting opportunities killed
because we were looking for the ir_variable child of a dereference to
get visited.
Fixes:
glsl1-function call with inout params
Simplify state handly by avoiding state allocation.
Next step is to allocate once for all context packet
buffer and then avoid rebuilding pm4 packet each time
(through use of combined crc) this would also avoid
number of memcpy.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Optimizations at compile time should generally be done with the goal
of reducing instruction count so that other work, particularly
linking, is less time-consuming if the shader is used multiple times.
However, function inlining increases instruction count for the inlined
function bodies without removing the original function body, since we
don't know if it will be used at link time or not.
Reduces the runtime of linking and executing a Yo Frankie fragment
shader from 0.9 seconds to 0.5 seconds (-45.9%, +/- 2.2%, n=5).
For a shader involving many small functions, this avoids running
optimization across all of them after they've been inlined
post-linking.
Reduces the runtime of linking and running a fragment shader from Yo
Frankie from 1.6 seconds to 0.9 seconds (-44.9%, +/- 3.3%).
We wouldn't want to go rewriting dereferences to variables to point at
the same variable it did before. While I didn't find a way to trigger
that, a shader in Yo Frankie managed to produce a self-assignment by
passing a constant to a function doing self assignment like this.
Cleans up the IR for glsl-deadcode-self-assign.shader_test
This implements fast Z clear, Z compression, and HiZ support for r300->r500
GPUs.
It also allows cbzb clears when fast Z clears are being used for the ZB.
It requires a kernel with hyper-z support.
Thanks to Marek Olšák <maraeo@gmail.com>, who started this off, and Alex Deucher at AMD for providing lots of hints.
v2:
squashed zmask ram size fix]
squashed r300g/blitter: fix Z readback when compressed]
v3:
rebase around texture changes in master - .1 fix more bits
v4:
migrated to using u_mm in r300_texture to manage hiz/zmask rams consistently
disabled HiZ when using OQ
flush z-cache before turning hyper-z off
update hyper-z state on dsa state change
store depthclearvalue across cbzb clears and replace it afterwards.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I introduced this for ir_dead_code to distinguish function parameter
outvals from varying outputs. Only, since ast_to_hir's
current_function is unset when setting up function parameters (they're
needed for making the function signature in the first place), all
function parameter outvals were marked as shader outputs anyway. This
meant that an inlined function's cloned outval was marked as a shader
output and couldn't be dead-code eliminated. Instead, since
ir_dead_code doesn't even look at function parameters, just use
var->mode.
The longest Mesa IR coming out of ir_to_mesa for Yo Frankie drops from
725 instructions to 636.
Mixing stderr (_mesa_print_program, _mesa_print_instruction,
_mesa_print_alu) with stdout means that when writing both to a file,
there isn't a consistent ordering between the two.
While the Mesa IR dumping includes some corresponding GLSL IR for
correlating Mesa IR to GLSL IR, it doesn't completely express it.
This printing includes things like variable declarations and control
flow structure that is hard to read otherwise.
Previously the in-line matrix and vector constructors would generate
swizzles in the LHS. The code is actually more clear if it just
generates the masked assignments instead of relying on the
ir_assignment constructor to convert the swizzles to write masks.
Replace swizzles on the LHS with additional swizzles on the RHS and a
write mask in the assignment instruction. As part of this add
ir_assignment::set_lhs. Ideally we'd make ir_assignment::lhs private
to prevent erroneous writes, but that would require a lot of code
butchery at this point.
Add ir_assignment constructor that takes an explicit write mask. This
is required for ir_assignment::clone, but it can also be used in other
places. Without this, ir_assignment clones lose their write masks,
and incorrect IR is generated in optimization passes.
Add ir_assignment::whole_variable_written method. This method gets
the variable on the LHS if the whole variable is written or NULL
otherwise. This is different from
ir->lhs->whole_variable_referenced() because the latter has no
knowledge of the write mask stored in the ir_assignment.
Gut all code from ir_to_mesa that handled swizzles on the LHS of
assignments. There is probably some other refactoring that could be
done here, but that can be left for another day.
Calling exit() on a memory failure probably made sense for the
standalone preprocessor, but doesn't seem too appealing as part of
the GL library. Also, we don't use it in the main compiler.
We only uploaded up to the highest offset a program would use,
and if the constant buffer isn't changed when a new program is
used, the new program is missing the rest of them.
Might want to introduce a "fill state" for user mem constbufs.
Mesa will do the mapping at _mesa_add_sampler() time. Fixes assertion
failures in debug builds, which might have caught real problems with
multiple samplers linked in a row.
Instead of using a linker-assigned location (since samplers don't
actually take up uniform space, being a link-time choice), use the
sampler's varaible pointer as a hash key.
The new compiler doesn't generate Mesa IR at compile time, and that
compile time code previously wouldn't have reflected the link time
code that actually got used. But do dump the info log of the compile
regardless.
In most cases, we needed to be reparenting the cloned IR to a
different context (for example, to the linked shader instead of the
unlinked shader), or optimization before the reparent would cause
memory usage of the original object to grow and grow.
seems hw can do unaligned accesses and unaligned strides
removes extra conversion when using vbo's
however I needed to switch 3 component byte format to 4 component formats
for tests to pass. Somewhat sililar to GL_SHORT fix done earlier
removes assert and gains +2 piglit especially draw-vertices
The BGNLOOP and ENDLOOP instructions are now being used correctly, which
makes break and continue possible. The deadcode pass has been modified to
handle breaks, and the compiler is more careful about which loops are
unrolled.
When we have instance divisors we don't really know which vertex
elements we'll be fetching ahead of time.
This fixes a bug in instanced drawing which was exposed by the new
draw_vbo() code because of max_index not being ~0 as often as it used
to be. The test for max_index >= DRAW_PIPE_MAX_VERTICES often hid
this problem before.
It's easily reproducible with Compiz with its Resize window mode
set to Normal (which is usually not the default mode).
https://bugs.freedesktop.org/show_bug.cgi?id=28658https://bugs.freedesktop.org/show_bug.cgi?id=29303
This is actually a workaround to prevent Compiz crashes.
Instead, a completely white titlebar might show up during resizing
transparent windows (a rare case).
The underlying cause should be fixed by someone who has more knowledge
about the code. (dri2_drawable_get_buffers should not return NULL)
Acked-By: Jakob Bornecrantz <jakob@vmware.com>
Even though the spec says that the limits should be -64/+63, proprietary
drivers support much larger relative offsets and some applications do
depend on this non-standard behavior.
Also program_parse.tab.c has been regenerated.
This fixes the parser error:
ARB_vp: error: relative address offset too large
See also: https://bugs.freedesktop.org/show_bug.cgi?id=28628
4096 * sizeof(vec4) is the maximum size of the constant buffer on NV50.
It is not supposed to be a definite hardware limit, it is for the parser
not to get in the way and let the underlying driver decide whether it can
run the shader or not.
Make sure LIT fills all slot for instruction (can't do W instruction
without having the Z slot filled with at least a NOP).
ALU instruction can't access more than 4 constant, move constant to
temporary reg if we reach the limit.
Fix ALU block splitting, only split ALU after ALU with last instruction
bit sets.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is a zero-ing function, (like calloc), to avoid bugs due to
accessing uninitialized values. Thanks to valgrind for noticing the
use of uninitialized values.
The symbol_header structure that tracks symbols with a particular name
may have a different (longer) life time than the symbols it tracks.
Not keeping a local copy of the name can lead to use-after-free
errors. For example, the following sequence would trigger such an
error:
char *copy = strdup(name);
_mesa_symbol_table_push_scope(st);
_mesa_symbol_table_add_symbol(st, 0, name, NULL);
_mesa_symbol_table_pop_scope(st);
free(name);
_mesa_symbol_table_find_symbol(st, 0, copy);
With this change, the symbol table keeps a local copy of the name that
has the same life time as the symbol_header for that name. This
resolves some use-after-free errors with built-in functions in the
GLSL compiler.
The non-named parameter grammar understandably doesn't set the
identifier field. Fixes intermittent failures about void main(void)
{} having a named void parameter.
If texture coordinates come from the vertex shader, there are always
4 components in the rasterizer input packet, but if the coordinates
are stuffed (like for point sprites), there are only 2 or 3 components
(based on GB_ENABLE) and if we rasterize more, it locks up.
r600 doesnt need the same normalization as r700 - instead it requires
range to be truncated to -pi..pi
I left the range trunc also effective on r700 althouch according the docs
it has sufficent range (-512*PI, +512*PI). The instructions seem
to be used not too often to cause perf loss because of this
Based on patches and testing by Conn Clark and Alain Perrot
mtypes.h does not use any symbols from compiler.h.
Also add the required headers for files that depended on symbols from
compiler.h but were indirectly including compiler.h through mtypes.h.
So many problems here. One is that we can't do the quadrant handling
for all the channels at the same time, so we call the float(y, x)
version multiple times. I'd also left out the x == 0 handling. Also,
the quadrant handling was broken for y == 0, so there was a funny
discontinuity on the +x side if you plugged in obvious values to test.
I generated the atan(float y, float x) code from a short segment of
GLSL and pasted it in by hand. It would be nice to automate that
somehow.
Fixes:
glsl-fs-atan-1
glsl-fs-atan-2
This prevents using uninitialized data in _msea_glsl_error in some
cases, (including at least 6 piglit tests). Thanks to valgrind for
pointing out the problem!
Otherwise, ir_function_inlining will see the body of the function from
the unlinked version of the shader, which won't have had the lowering
passes done on it or linking's variable remapping.
With gl_program::IndirectRegisterFiles we can distinguish between indirect
addressing of constants vs. temporaries. In the case of temporaries,
declare all temps up front sequentially.
Fixes fd.o bug 29305.
Now drivers, etc. can know which register files are accessed with
indirect addressing. Before we just checked gl_program::NumAddressRegs
but didn't know if that was the constant buffer, temp regs, or what.
The only user of this new field so far will be the gallium state tracker.
Make sure that all the element indexes actually lie inside the vertex
buffer.
Also, rename pipe_run() to pipe_run_elts() to be more specific.
And assert/check the vertex count for the non-indexed case.
Plumb the constant buffer sizes down into the tgsi interpreter where
we can do bounds checking. Optional debug code warns upon out-of-bounds
reading. Plus add a few other assertions in the TGSI interpreter.
The problem with doing it on the way in is that for a function with
multiple early returns, we'll move an outer block in, then restart the
pass, then move the two inside returns out, then never move outer
blocks in again because the remaining early returns are inside an else
block and they don't know that there's a return just after their
block. By going inside-out, we get the early returns stacked up so
that they all move out with a series of
move_returns_after_block().
Fixes (on i965):
glsl-fs-raytrace-bug27060
glsl-vs-raytrace-bug26691
This catches a few remaining functions that weren't getting inlined,
generally operating on global or out variables and using an early
return to skip work when possible.
Fixes for i965:
glsl1-function with early return (3)
Fixes a failed assertion with LLVM 2.6:
<unnamed>::JITResolver::JITResolver(llvm::JIT&): Assertion
`TheJITResolver == 0&& "Multiple JIT resolvers?"' failed.
Though, not everyone seems to experience this problem.
That is, remove pipe_context::draw_arrays, pipe_context::draw_elements,
pipe_context::draw_arrays_instanced,
pipe_context::draw_elements_instanced,
pipe_context::draw_range_elements.
Some drivers define a generic function that is called by all drawing
functions. To implement draw_vbo for such drivers, either draw_vbo
calls the generic function or the prototype of the generic function is
changed to match draw_vbo.
Other drivers have no such generic function. draw_vbo is implemented by
calling either draw_arrays and draw_elements.
For most drivers, set_index_buffer does not mark the state dirty for
tracking. Instead, the index buffer state is emitted whenever draw_vbo
is called, just like the case with draw_elements. It surely can be
improved.
This commit adds a new unified draw_vbo method to pipe_context. Unlike
other draw methods, draw_vbo treats the index buffer as a state which is
set with set_index_buffer.
These extensions allow an application to make a context current by
passing EGL_NO_SURFACE for the write and read surface in the call to
eglMakeCurrent. The motivation is that applications that only want to
render to client API targets (such as OpenGL framebuffer objects)
should not need to create a throw-away EGL surface just to get a
current context.
I am not sure how to properly handle flat shading regarding
non color parameter to fragment shader. It seems we should
still interpolate non color using linear interpolation and
flat shade only apply to color.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Split hw vs pipe states creation handling as hw states group doesn't
match pipe state group exactly. Right now be dumb about that and
rebuild all hw states on each draw call. More optimization on that
side coming.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is now consistent with other usage of flex/bison througout mesa,
(which is that these generated files are added to source control so
that the build system does not require external tools like flex/bison
for non-developers).
The mesa build environment does not (currently) accept external
dependencies such as flex and bison. The compromise is to commit the
generated output files, (in spite of the pain that comes from having
generated files under version control).
The "%expect 0" construct will make bison emit an error if any future
changes to the grammar introduce shift/reduce conflicts, (without also
increasing the number after "%expect").
Since we have productions to turn "defined FOO" and "defined ( FOO )"
into a conditional_token we don't need to list DEFINED as an operator
as well. Doing so just introduces the shift/reduce ambiguity with no
benefit.
There is no assignment to the "ret" variable if X_DRI2SwapBuffers is
not defined. In this case, the earlier explicit "return 0" is likely
to be used, but the compiler can't be sure of that, (nor can I for
that matter).
We cover this case by explicitly initializing "ret" to 0.
My earlier attempt to eliminate this warning (c0ca2bfb2a) was
invalid as it removed the variable declaration. Jerome correctly
reverted that (600c85efdb) since the variable is used when
X_DRI2SwapBuffers is defined.
Here, instead of removing the declaration, we move it to inside the
correct #ifdef.
Previous patch added literal emission to wrong place, we
want to emit literal before emitting a new alu group.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add texture mapping support, redbook/texbind works if
you comment out glClear and second checkboard. Need to
fix :
- texture overwritting
- lod & mip/map handling
- unormalized coordinate handling
- texture view with first leve > 0
- and many other things
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Cube map faces and 3D texture slices are treated the same in llvmpipe
textures. Need to pass the sum of these fields to
llvmpipe_unswizzle_cbuf_tile() as we do elsewhere.
Fixes piglit fbo-3d test (fd.o bug 29135).
I managed to revert the change from unlinked at some point while
cleaning up the changes. glsl-fs-raytrace-bug27060 drops from 389
instructions to 370.
If an undeclared variable was dereferenced in an expression that
needed constant expression handling, we would walk off a null ir->var
pointer.
Fixes:
glsl1-TIntermediate::addUnaryMath
This cleans up the assembly output of almost all the non-logic tests
glsl-algebraic-*. glsl-algebraic-pow-two needs love (basically,
flattening to a temporary and squaring it).
This pulls in multiple i965 driver fixes which will help ensure better
testing coverage during development, and also gets past the conflicts
of the src/mesa/shader -> src/mesa/program move.
Conflicts:
src/mesa/Makefile
src/mesa/main/shaderapi.c
src/mesa/main/shaderobj.h
Several routines directly analyze the grf-to-mrf moves from the Gen
binary code. When it is possible, the mov is removed and the message
register is directly written in the arithmetic instruction
Also redundant mrf-to-grf moves are removed (frequently for example,
when sampling many textures with the same uv)
Code was tested with piglit, warsow and nexuiz on an Ironlake
machine. No regression was found there
Note that the optimizations are *deactivated* on Gen4 and Gen6 since I
did test them properly yet. No reason there are bugs but who knows
The optimizations are currently done in branch free programs *only*.
Considering branches is more complicated and there are actually two
paths: one for branch free programs and one for programs with branches
Also some other optimizations should be done during the emission
itself but considering that some code is shader between vertex shaders
(AOS) and pixel shaders (SOA) and that we may have branches or not, it
is pretty hard to both factorize the code and have one good set of
strategies
Because the hw can't sample it, I reinterpret the format as G16R16 and
sample the G component. This gives 16 bits of precision, which should be
enough for depth texturing (surprisingly, the sampled values are exactly
the same as in D16 textures).
This also enables EXT_packed_depth_stencil on those old chipsets, finally.
The number of macrotiles in the Y direction must be even, otherwise memory
corruption may happen (e.g. broken fonts). Basically, if we get a buffer
in resource_from_handle, we can determine from the buffer size whether it's
safe to use the CBZB clear or not.
This decouples initializing a texture layout/miptree description
from an actual texture creation, it also partially unifies texture_create
and texture_from_handle.
r300_texture inherits r300_texture_desc, which inherits u_resource.
The CBZB clear criteria are moved to r300_texture_desc::cbzb_allowed[level].
And other minor cleanups.
The driver gets a buffer and its size in resource_from_handle.
It computes the required minimum buffer size from given texture
properties, and compares the two sizes.
This is to early detect DDX bugs.
The es1, es2 and gl state trackers include draw_pipe.h, which includes
the llvm headers if MESA_LLVM is true, so we also need to add the
llvm seachpaths.
Similarly, gallivm and other gallium drivers need LLVM_CFLAGS to build when enabled.
Also fix xorg drivers, they didn't include LDFLAGS.
Writing a compiler is time consuming and error prone in
order to allow r600g to further progress in the meantime
i wrote a simple tgsi assembler, it does stupid thing but
i would rather keep the code simple than having people
trying to optimize code it does.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
If we bias x,y we still need to pass through z,w in case the shader
reads gl_FragCoord.z or .w.
Fixes fd.o bug 29183 (piglit glsl-bug-22603).
NOTE: This is a candidate for the 7.8 branch.
Also move some initialization from screen init to pre-init, now
that it is possible.
Also import a new vmwgfx drm (1.3) header.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This makes it possible to prune modes already in pre-init.
We also keep these resources alive across server generations, and
they are implicitly closed on server exit.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Add a customizer callback just before initial config setting, so that the
customizer code can initialize the mode validator using the drm
file-descriptor.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Top-level instructions now get NULL as their default type (since type is
irrelevant for things like ir_function), while ir_rvalues get error_type
by default.
This should make it easier to tell if we've forgotten to set a type. It
also fixes some "Conditional jump or move depends on uninitialized
value" errors in valgrind caused by ir_validate examining the type of
top level ir_instructions, which weren't set.
Assignments can only exist at the top level instruction stream; the
residual value is handled by assigning the value to a temporary and
returning an ir_dereference_variable of that temporary.
Variables with mode ir_var_temporary were causing an out of bounds array
access and filling my screen with rubbish. I'm not sure if "temporary"
is the right thing to print.
It's really hard to believe that this case has been broken, but apparently
no test previously exercised it. So this commit adds such a test and fixes
it by making a copy of the argument token-list before expanding it.
This fix causes the following glean tests to now pass:
glsl1-Preprocessor test 6 (#if 0, #define macro)
glsl1-Preprocessor test 7 (multi-line #define)
If we put the protos in separate ir_functions, they wouldn't be found
at lookup time for linking.
Fixes:
glsl-fs-texture2d-bias
glsl-fs-texture2dproj-bias
glsl-fs-texture2dproj-bias-2
glsl-lod-bias
glsl1-texture2D(), computed coordinate
Previously, the compiler expected the result of the multiplication to
be of the same type as the vector. This is correct for square
matrices, but wrong for all others.
We fix this by instead expecting a vector with the same number of rows
as the matrix (for the case of M*v with a column vector) or the same
number of columns as the matrix (for v*M with a row vector).
This fix causes the following four glean tests to now pass:
glsl1-mat4x2 * vec4
glsl1-vec2 * mat4x2 multiply
glsl1-vec3 * mat4x3 multiply
glsl1-vec4 * mat3x4 multiply
This has confused me twice now. It's a fixed width of 4 (usually a
region description of <4,4,1>), not 1. If it was 1, we'd have been
skipping all over register space.
The ARL value is increments of vec4 in the register file. But
PROGRAM_TEMPORARY or PROGRAM_INPUT are stored as vec4s interleaved
between the two verts being executed (thus a vec8 each), compared to
PROGRAM_STATE_VAR being packed vec4s.
Fixes:
glsl-vs-arrays-2
glsl-vs-mov-after-deref
(without regressing glsl-vs-arrays-3)
Otherwise, the second half isn't written, and we end up reading back
black.
Fixes the remaining junk drawn in glsl-max-varyings, and will likely
help with a number of large real-world shaders.
They go into the render cache, so while we don't care about their
contents after execution, failing to note them could cause the writes
to be flushed over important buffer contents later.
Since GLSL permits arrays of structures, we need to store each element
as an ir_constant*, not just ir_constant_data.
Fixes parser tests const-array-01.frag, const-array-03.frag,
const-array-04.frag, const-array-05.frag, though 03 and 04 generate the
wrong code.
Implicit conversions were not being performed, nor was there any
type checking - it was possible to have, say, var->type == float
and var->constant_value->type == int. Later use of the constant
expression would trigger an assertion.
Fixes piglit test const-implicit-conversion.frag.
This is an invasive set of changes. Each user shader tracks a set of other
shaders that contain built-in functions. During compilation, function
prototypes are imported from these shaders. During linking, the
shaders are linked with these built-in-function shaders just like with
any other shader.
some of the ALU instructions are different on r6xx vs r7xx,
separate the alu translation to separate files, and use family
to pick which compile stage to use.
Depth clamping seems to be implicit if clipping is disabled.
It's not perfect, but it's good enough for wine and passes
the corresponding piglit tests.
In both the preprocessor and in the compiler proper, we use a custom
yyltype struct to allow tracking the source-string number in addition
to line and column. However, we were previously relying on bison's
default initialization of the yyltype struct which of course is not
aware of the source field and leaves it uninitialized.
We fix this by defining our own YYLLOC_DEFAULT macro expanding on the
default version (as appears in the bison manual) and adding
initialization of yylloc.source.
This quiets a compiler warning, (and ensures a segmentation fault rather
than memory corruption if this variable is written through before being
initialized elsewhere).
While bootstrapping the dependencies, make will see the "include depend"
directive before the depend file has been created. To avoid a spurious
warning in this case we use "-include" instead, (which differs precisely
in the fact that it will not emit a diagnostic if the named file does
not exist).
This avoids two "function defined but not used" warnings. For the yyinput
function we define YY_NO_INPUT which tells flex to simply not generate this
function.
For unput, we add a call to this function, but inside a while(0) so
that it will quiet the warning without actually changing any
functionality.
Add declarations for two functions generated in the flex ouput. It
would be nicer if flex simply declared these generated functions as
static, but for now we can at least avoid the warning this way.
Previously, any occurence of the unary plus operator would trigger a
bogus type mismatch error. Fix this by making the ast_plus case look
more like the ast_neg case as far as type-checking is concerned.
With this change the shaders/CorrectPreprocess8.frag test in piglit
now passes.
Instead of one big boolean indicating indirect addressing, use a
bitfield indicating which register files are accessed with indirect
addressing.
Most shaders that use indirect addressing only use it to access the
constant buffer. So no need to use an array for temporary registers
in this case.
The previous code assumed that all elements of the address register
were the same. But it can vary from pixel to pixel or vertex to
vertex so we must use a gather operation when dynamically indexing
the constant buffer.
Still need to fix this for the temporary register file...
This is quite a large patch because breaking it into smaller pieces
would result in the tree being intermitently broken. The big changes
are:
* Add the ir_var_temporary variable mode
* Change the ir_variable constructor to take the mode as a
parameter and correctly specify the mode for all ir_varables.
* Change the linker to not cross validate ir_var_temporary
variables.
* Change the linker to pull all ir_var_temporary variables from
global scope into 'main'.
Since the types are singletons across the lifetime of the compiler,
repeatedly compiling a program with the same structure type defined
would drop a copy of the array on the floor per compile.
This is a bit tricky because the static GLSL types are not called with
the talloc-based new, so we have to use the global type context, which
may not be initialized yet.
We regularly do lookups on the field names of the structure to find
the types within the struct, so returning a structure type with bad
names will lead to lots of error types being found.
Fixes piglit test glsl-array-length, and provides proper error messages
for negative piglit tests array-length-110.frag, array-length-unsized.frag,
and array-length-args.frag.
This assertion is triggered by method calls (i.e. array.length()), where
subexpressions[1] is an ast_function_call expression. Since the
assertion itself had a comment saying it could be removed eventually,
simply do so.
Causes negative glslparser tests array-length-110.frag,
array-length-args.frag, and array-length-unsized.frag to pass, but only
because the length() method is not supported yet.
The constant_expression_wrapper was already the only external API, and
much of the internal code used it anyway. Also, it wouldn't ever visit
non-rvalue ir_instructions, so using a visitor seemed a bit unnecessary.
This uses "ir_foo *ir = this;" lines to avoid code churn. These should
be removed.
Consider this test case:
#define EMPTY
int foo = 1+EMPTY+4;
The expression should compile as the sequence of tokens 1, PLUS,
UNARY_POSITIVE, 4. But glcpp has been failing for this case since it
results in the string "1++4" which a compiler correctly sees as a
syntax error, (1, POST_INCREMENT, 4).
We fix this by changing any macro with an empty definition to result
in a single SPACE token rather than nothing. This then gives "1+ +4"
which compiles correctly.
This commit does touch up the two existing test cases which already
have empty macros, (to add the space to the expected result).
It also adds a new test case to exercise the above scenario.
We define the YY_NO_INPUT macro to avoid one needless function being
generated.
for the other needless functions, (yyunput and yy_top_state), we add a
new UNREACHABLE start condition and call these functions from an
action there. This doesn't change functionality at all, (since we
never enter the UNREACHABLE start condition), but makes the compiler
stop complaining about these two functions being defined but not used.
It's really a bug in flex that these functions are generated with neither
a declaration nor the 'static' keyword, but we can at least avoid the
warnings this way.
Previously, if the outer #ifdef/#ifndef evaluated to false, the inner
directive would not be parsed correctly, (the identifier as the subject
of the #ifdef/#ifndef would inadvertently be skipped along with the other
content correctly being skipped).
We fix this by setting the lexing_if state in each case here.
We also add a new test to the test suite to ensure that this case is tested.
As it turns out, 4 of our current tests are not valgrind clean,
(use after free errors or so), so this will be helpful for
investigating and fixing those.
These were only ever intended to exist in the original, standalone
implementation of glcpp, (with the idea of dropping them as soon as
the code moved into mesa). The current build system wasn't compiling
this C file, but the presence of the header files could cause problems
if the two copies diverge in the future.
We head those problems off by deleting al of these redundant files.
The parameters[i] is our inlined variables representing the
parameters, so they are always ir_var_auto. Walk the signature params
in handling "out" values like we do for "in" values to find the mode.
Fixes (with the previous 2 commits):
glsl1-function call with in, out params
glsl1-function call with inout params
If they have a return value, this means putting it into a temporary
and making a deref of the temp be the rvalue, since we don't know if
the rvalue will be used or not.
I'd flipped around the order of two operations in paren-balancing
adventures, and left out the multiply by sign(x) required for negative x.
Fixes:
glsl1-acos(vec4) function
glsl1-asin(vec4) function
glsl1-atan(vec4) function
I hadn't noticed you could do this, but glsl1 tests caught it. Fixes:
glsl1-Swizzled writemask
glsl1-Swizzled writemask (2)
glsl1-Swizzled writemask (rgba)
glsl1-Swizzled writemask (stpq)
If max_index=0xffffffff and elt_bias > 0 the test for
elt_bias + max_index >= DRAW_PIPE_MAX_VERTICES
was wrong. Check earlier if max_index=0xffffffff and do the
"fail" case.
This fixes the piglit draw-elements-base-vertex test (and probably
some other things).
When defining mipmap level 'L' and level L-1 exists and the new level's
internalFormat matches level L-1's internalFormat, then use the same hw
format. Otherwise, do the regular ctx->Driver.ChooseTextureFormat() call.
This avoids a problem where we end up choosing different hw formats for
different mipmap levels depending on how the levels are defined (glTexImage
vs. glCopyTexImage vs. glGenerateMipmap, etc).
The root problem is the ChooseTextureFormat() implementation in some
drivers uses the user's glTexImage format/type parameters in the choosing
heuristic. Later mipmap levels might be generated with different calls
(ex: glCopyTexImage()) so we don't always have format/type info and the
driver may choose a different format.
For more background info see the July 2010 mesa-dev thread "Bug in
_mesa_meta_GenerateMipmap"
GLXscreenConfigs is badly named and a dumping ground for a lot of stuff.
This patch creates private screen structs for the dri drivers and moves
some of their fields over there.
There was confusion on both the size of message we can send, and on
what the URB destination offset means.
The remaining problems appear to be due to spilling of regs in the
fragment shader being broken.
Instead of using ir_call::callee, search for the signature in the
linked shader. This will allow resolving calls from functions
imported from other shaders. The ir_call::callee pointer in the
imported function will still reference a signature in the original shader.
The list of shaders to search needs to be provided as an explicit
parameter to support coming changes. At that point there is no reason
for it to be in the class. Also, fix some of the 'const' decorators.
Give ir_function::parameter_lists_match_exist similar treatment. Make
the parameters const, and propogate the constness as far as it will
trivially go.
It now works correctly when nodes are removed, as it was originally
intended to do; it no longer processes nodes added to the list before
the current node, nor those added immediately after the current node.
This matches the behavior of Linux's list_for_each_safe.
The node being processed may be removed from the list and put in a
different list. Not using the safe version caused list processing to
change streams after moving a node.
For the shader containing 'main', use the linked shader (i.e., the
clone of the original shader that contained main) as the source for
global instructions to move into main.
When faced with a constructor like 'ivec4(0, 2, 0, 0)', we would
manage to get a value of 2 instead of 0 for the first "0". Usually 2
characters past "0" would point at some junk and lex as 0 anyway.
Fixes glsl-octal and glsl-unused-varyings.
The Mac OS X SCons build failed on 32-bit CPUs starting with commit
2f6d47a7c8 during linking of graw-null.
The build succeeds though on a 64-bit CPU. See FDO bug 29117.
This was the compiler error.
scons: building associated VariantDir targets: build/darwin-x86-debug
Linking build/darwin-x86-debug/gallium/targets/graw-null/libgraw.dylib ...
Undefined symbols:
"_lp_swizzled_cbuf", referenced from:
_lp_swizzled_cbuf$non_lazy_ptr in libllvmpipe.a(lp_rast.os)
_lp_swizzled_cbuf$non_lazy_ptr in libllvmpipe.a(lp_rast_tri.os)
(maybe you meant: _lp_swizzled_cbuf$non_lazy_ptr)
"_lp_dummy_tile", referenced from:
_lp_dummy_tile$non_lazy_ptr in libllvmpipe.a(lp_rast.os)
_lp_dummy_tile$non_lazy_ptr in libllvmpipe.a(lp_rast_tri.os)
_lp_dummy_tile$non_lazy_ptr in libllvmpipe.a(lp_setup.os)
(maybe you meant: _lp_dummy_tile$non_lazy_ptr)
The patch adds -fno-common to all Mac OS X builds to work around this issue.
This is a big deal for debugging if nothing else ("what class is this
ir_instruction, really?"), but is also nice for avoiding building a
whole visitor or an if (node->as_whatever() || node->as_other_thing())
chain.
DRI2 events are sent to the X drawable ID used to create the DRI2 drawable,
not the GLX drawable ID. So when an event comes in, we need to look up
the __GLXDRIdrawable by its X drawable ID, which needs a new hash table.
Use a single swizzled tile per colorbuf (and per thread) to avoid
accumulating large amounts of cached swizzled data.
Now that the SSE3 code has been merged to master, the performance delta
of this change is minimal, the main benefit is reduced memory usage
due to no longer keeping swizzled copies of render targets.
It's clear from the performance of the in-place version of this code
that there is still quite a bit of time being spent swizzling &
unswizzling, but it's not clear exactly how to reduce that.
The translate_prim() function tries to convert quad strips into
tri strips. This is normally OK but we have to check for an odd
number of vertices so that we don't accidentally draw an extra
triangle. The mesa-demos/src/samples/prim.c demo exercises that.
With this fix the stray yellow triangle is no longer drawn.
Use the u_trim_pipe_prim() function to make sure that prims have
the right number of vertices and avoid calling gallium drawing
functions when the prim has a degenerate number of vertices.
Plus add comments, clean-up formatting, etc.
NOTE: This is a candidate for the 7.8 branch.
The function is used by _eglGetConfigs and _eglGetScreens. The array
size should not be limited by the buffer size when the buffer is NULL.
This fixes fdo bug #29052.
Fixes piglit object_purgeable-api-pbo, object_purgeable-api-vbo
and object_purgeable-api-texture failures with swrast.
NOTE: This is a candidate for the 7.8 branch.
When direct rendering is being used, DRI2 BufferSwapComplete events are
sent unconditionally to clients, even if they haven't been requested.
This causes error messages to be printed by every freeglut application
of the form
freeglut (./gears): Unknown X event type: 104
and might confuse other clients.
This is a fixed up version of the patch by Jesse Barnes, which drops
BufferSwapComplete events if they are not requested by clients.
Fixes fdo bug 27962.
Signed-off-by: Nick Bowler <nbowler@draconx.ca>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The lp_rast_shader_inputs' alignment is irrelevant now that it contains
pointers instead of actual data.
Likewise, lp_rast_triangle's size alignment is meaningless.
The big change is to delay address reg setup until the instruction
that needs the deref. It was hard to use the deref chain support for
the LHS because it does the copy of the dereffed value to a temporary
(to avoid problems when two src regs are array derefs), so we wouldn't
haev a pointer to actual storage in the end.
Fixes glsl-vs-arrays on swrast.
Just put a pointer to the state in the tri->inputs struct. Remove
some complex logic for eliminating unused statechanges in bins at the
expense of a slightly larger triangle struct.
Move this code back out to C for now, will generate separately.
Shader now takes a mask parameter instead of C0/C1/C2/etc.
Shader does not currently use that parameter and rasterizes whole
pixel stamps always.
Rather than inserting an lp_rast_fence command at the end of each
bin, have each rasterizer thread call this function directly once
it has run out of work to do on a particular scene.
This results in fewer calls to the mutex & related functions, but more
importantly makes it easier to recognize empty bins.
Find instructions in all shaders that are not contained in a function
(i.e., initializers for global variables). "Move" these instructions
to the top of the main function in the linked shader. As a
side-effect, many global variables will also be copied into the linked
shader.
The linker needs to use this function to get specific function signatures, but
it also needs to modify the returned signature. Since this method isn't itself
const (i.e., const this pointer), there is no value in making a const and
non-const version.
This currently involves an ugly hack so that every link doesn't result
in all the built-in functions showing up as multiply defined. As soon
as the built-in functions are stored in a separate compilation unit,
ir_function_signature::is_built_in can be removed.
The instruction can be hung off of any other in the tree, even if the
other one will be deleted, since it'll get stolen to the shader's
context later if it's still live.
We would clear the in_lhs flag early, avoiding copy propagation on the
array index variable (oops) and then copy propagating on the array
variable (ouch). Just avoid all copy propagation on the LHS instead.
Add two invariant checks related to functions and function signatures:
1. Ensure that function definitions (ir_function) are not nested.
2. Ensure that the ir_function pointed to by an ir_function_signature
is the one that contains it in its signatures list.
There is no setter function, the getter returns a constant pointer,
and ir_function_signature::_function is private for a reason. The
only way to make a connection between a function and function
signature is via ir_function::add_signature. This helps ensure that
certain invariants (i.e., a function signature is in the list of
signatures for its _function) are met.
Temporary variables added for &&, ||, and ?: were not being added to
the instruction stream. This resulted in either test failures or
Valgrind being angry after the original IR tree was destroyed by
talloc_free. The talloc_free caused the ir_variables to be destroyed
even though they were still referenced.
This will be used by the Mesa IR and likely most HW backends, as it
allows other optimizations to occur that might not otherwise.
Fixes glsl-vs-mat-sub-1, glsl-vs-mat-div-1.
Need to flush command stream before mapping texture image
that is referenced by current cs.
Candidate for 7.8 branch.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
This feels a little odd, but it will be useful for ir_mat_to_vec,
where I want to see a plain assignment of the expression to a
variable, not to a writemasked array dereference with a call as the
array index.
It would be easy to miss an entry either of the two visitors involved
that would result in trying to ir->remove() the call to remove it from
the instruction stream when really it's part of an expression tree
that wasn't flattened.
There's already an implementation of pipe_barrier using
the other pipe_* primitives; just use that on Windows, too.
Now Windows passes pipe_barrier_test.
Unfortunately compiling with these defines enabled would mean
Gallium can't run on Windows XP/2003 or older.
Todo: Need a macro to declare if we don't care about WinXP
compatibililty.
Or at least a little of it. This version will sleep
for a fixed amount of time instead of just deadlocking,
which is a slight improvement.
Also do the same thing on any unrecognized platform.
An initial implementation made by Dave Airlie.
For it to be used, a color-only clear must be invoked and exactly one
point-sampled render target must be set. The render target must be
macrotiled (for us to overcome alignment issues) and bpp must be either
16 or 32.
I can't see a difference in performance. :(
Conflicts:
src/gallium/drivers/r300/r300_blit.c
With an ordinary quad, the pixels on the main diagonal are computed
and stored twice, which is somewhat inefficient and might not work well
with specialized clear codepaths.
When searching for valid miptree check images in range
of [BaseLeve, MaxLevel] not [MinLod, MaxLoad].
Prevents unnecessary miptree allocations in cases when during
every rendering operation different texture image level
was selected using MIN_LOD = MAX_LOD = level (for every level
new miptree for whole texture was allocated).
Candidate for 7.8 branch.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
Always store selected miptree in texObj->mt so get_base_teximage_offset returns correct data.
Found with piglit/mipmap-setup.
Candidate for 7.8 branch.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
Previously, code like ivec4(mat2(...)) would fail because the compiler
would naively try to convert a mat2 to an imat2...which doesn't exist.
Now, a separate pass breaks such matrices down to their columns, which
can be converted from vec2 to ivec2.
Fixes piglit tests constructor-11.vert, constructor-14.vert,
constructor-15.vert, and CorrectConstFolding2.frag.
In particular, with foreach_list_safe, one can remove and free the current
node without crashes; if new nodes are added after the current node,
they will be properly visited as well.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The following instruction sequence will no longer be emitted in separate
TEX blocks:
0: TEX temp[0].xyz, temp[1].xy__, 2D[0];
1: TEX temp[1].xyz, temp[2].xy__, 2D[0];
This fixes fdo bug #25109
The code to emit an array of OpenGL state vars lacked the code
to handle the gl_TextureMatrix[] array.
Fixes fd.o bug 28967
NOTE: this is a candidate for the 7.8 branch.
Most places in the code simply use a static name, which works because
names are never used to look up an ir_variable. generate_temporary is
simply unnecessary (and looks like it would leak memory, and isn't
thread safe...)
When EU executes 'wait' instruction, it stalls and sets notification
register state. Host can issue MMIO write to clear notification
register state to allow EU continue on executing again.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Fixes fd.o bug 27216. May also be the root cause of fd.o bug 28950.
We weren't propogating the storage info for the x=foo() expression up
through the IR tree to the inequality expression.
NOTE: This is a candidate for the Mesa 7.8 branch.
Previously, a register defined at main scope and used in a loop in a
loop could end up getting marked as needed only from the definition
outside of the loops to the end of the inner loop, and we would
cleverly slot in something else in its register in the end of the
outer loop.
Fixes glsl-vs-loop-nested and glsl-fs-loop-nested on glsl2. This
doesn't happen much on master because the original compiler does its
own register allocation, so we find little we can do with linear scan
register (re)allocation.
First, this undoes commit e503af4baa
so we use iround() in lp_build_nearest_mip_level().
Second, in lp_build_sample_general() we need to check if we're sampling
a cube map before anything else. Choose the cube face and then recompute
the partial derivatives of (S,T) with respect to the chosen cube face.
Before, we were using the directional (S,T,R) derivatives to compute
the LOD.
Third, work around an apparent bug in LLVM 2.7 where setting the lod
variable to a const(0) value results in bad x86 code. See comments in
the code.
Uses of the bits for allocation are offset by 16, and
VERT_BIT_GENERIC0 already has the 16 offset. As a result, it was
preventing the wrong thing from being allocated.
rcp of an integer value did not produce the result you're looking for.
Instead, do the a * rcp(b) as float and truncate after. This mostly
fixes glsl-fs-loop-nested.
This allows function inlining making the following tests work even
without function calls implemented:
glsl-fs-functions-2
glsl-fs-functions-3
glsl-vs-functions
glsl-vs-functions-2
glsl-vs-functions-3
glsl-vs-vec4-indexing-5
(Note that those tests were designed to trigger actual function calls,
and this defeats them. However, those testcases ended up catching the
bug in the previous commit.)
The _slang_*_output_name() functions had one too many loop iterations
because of the sentinal end-of-list values in the vertOutput array.
Just use Elements() everywhere.
The Mesa IR needs this to support vector indexing correctly, and
hardware backends such as 915 would want this behavior as well.
Fixes glsl-vs-vec4-indexing-2.
Also handles matrix/vector and vector/matrix multiplication.
Fixes piglit tests const-matrix-multiply-01.frag,
const-matrix-multiply-02.frag, and const-vec-mat.frag.
The test here is slightly different since we need to keep matrix
multiplication separate.
Fixes piglit tests const-vec-scalar-03.frag and const-mat-scalar-03.frag.
The previous table didn't distinguish gl_Color for the VS and FS, so
we would use the FS's attribute index for the VS and read undefined.
This partially fixes glsl-routing to match its behavior on master.
Plus fix minor error in lp_build_iceil() by tweaking the offset value.
And add a bunch of comments for the round(), trunc(), floor(), ceil()
functions.
it was wrong to put this in the fs paths, but it was easier to just
stuff it along the fragment texture sampling paths. the patch
disconnects vertex texture sampling and just maps the textures
before the draw itself and unmaps them after.
softpipe doesn't implement the draw's llvm tex sampling interface
so make sure draw can handle the cases where the driver doesn't
implement the interface
Assert ctx->Driver.NewTransformFeedback if the feature is enabled; Use
the default callbacks otherwise. The rest of core mesa expects the
state to be initialized.
Driver loading is now splitted into two stages. In the first stage, an
_EGLModule is created for each driver: user driver, default drivers, and
all files in the search directories that start with "egl_". Modules are
not loaded at this stage.
In the second stage, each module is loaded to initialize a display. The
process stops at the first module that can initialize the display.
If eglGetProcAddress is called before eglInitialize, the same code path
will be taken to find the first module that supports
EGL_DEFAULT_DISPLAY. Because we do not want to initialize the display,
drv->Probe is used instead in this case.
Check FEATURE_GL in _mesa_init_shader_dispatch and
_mesa_init_shader_uniform_dispatch. OpenGL ES can not and does not use
_mesa_init_<...>_dispatch. This is supposed to be temporary. Ideally,
a more flexible way for initializing dispatch tables should be
developed.
This also allows us to split the loop emulation into two phases. A
tranformation phase which either unrolls loops or prepares them to be
emulated, and the emulation phase which unrolls remaining loops until the
instruction limit is reached. The second phase is completed after the
deadcode analysis in order to get a more accurate count of the number of
instructions in the body of loops.
OPCODE_EXP is not to ir_unop_exp what OPCODE_EX2 is to ir_unop_exp2.
It's the weird VP approximation helper opcode. Just implement it with
OPCODE_POW instead.
Fixes glsl-fs-exp.
These 3 fields are per shader-program. Copy them into the geometry
program at link time for convenient access later.
Also, add some missing glGetProgramiv() queries.
With how we generate assignments, the trivial copy propagation in it
is really important, and some drivers will really want the register
allocation, too.
Before, if there were no color buffers enabled (with glDrawBuffers(GL_NONE))
when the texenv program was generated, we'd emit writes to OUTPUT[1] but
the OutputsWritten mask was 0. This inconsistency caused an assertion to
fail later in the Mesa->TGSI translation.
Fixes fd.o bug 28169
NOTE: this is a candidate for the 7.8 branch (and depends on commit
b6b9b17d27).
Need to pass the index indicating which blend terms to use, not which
color buffer we're blending into.
Rename the parameter to blend_quad() and add comments to be more clear
about this.
All the state that effects the program should be in the key.
This didn't help with bug 28169 but is a good fix anyway.
NOTE: this is a low-priority candidate for the 7.8 branch. In practice,
this issue might never be hit.
before this change, r600 glxinfo segfaulted in the list code, and I wasn't
debugging another linked list implementation, its 2010 after all.
So add the two missing list macros to the gallium header from X.org list header file (after fixing them), then port all r600 lists to the new header.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently only GL_ARB_draw_buffers and GL_ARB_texture_rectangle are
defined because those extensions are always enabled. This make
tex_rect-03.frag pass.
Fix this build error (in MesaGLUT-7.6.1)...
glut_cmap.c:23:66: error: X11/Xmu/StdCmap.h: No such file or directory
...by not preventing the cflags that pkg-config finds for glut dependencies
(including 'xmu') from being used.
Defining GLUT_CFLAGS before running the pkg-config prevents the
cflags found by pkg-config from being used.
This patch lets GLUT_CFLAGS that configure & pkg-config work
so hard to set actually get used.
Also make sure the generated configs/autoconf defines GLUT_CFLAGS
used in (at least) src/glut/glx/Makefile.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
glsl-derivs would add 40.0, 0.0, and 1.0 in order. When we went
looking for 0.0, we'd find it in the second slot of the param, and use
it, but param->Size would still be 1. When we went to add 1.0 and
didn't find it, we'd put allocate it to that second slot and the 0.0
would actualy end up being 1.0.
Fixes glsl-derivs, glsl-deriv-varyings.
_mesa_glsl_parse_state should be the parent for all temporary allocation
done while compiling a shader. glsl_shader should only be used as the
parent for the shader's final IR---the _result_ of compilation.
Since many IR instructions may be added or discarded during optimization
passes, IR should not ever be allocated to glsl_shader directly.
Done via sed -i s/talloc_parent(state)/state/g and s/talloc_parent(st)/st/g.
This also removes a ton of talloc_parent calls, which may help performance.
When compiled with the more aggressive compiler warnings such as
-Wshadow and -Wempty-body the libtess code gives a lot more
warnings. This fixes the following issues:
* The 'Swap' macro tries to combine multiple statements into one and
then consume the trailing semicolon by using if(1){/*...*/}else.
This gives warnings because the else part ends up with an empty
statement. It also seems a bit dangerous because if the semicolon
were missed then it would still be valid syntax but it would just
ignore the following statement. This patch replaces it with the more
common idiom do { /*...*/ } while(0).
* 'free' was being used as a local variable name but this shadows the
global function. This has been renamed to 'free_handle'
* TRUE and FALSE were being unconditionally defined. Although this
isn't currently a problem it seems better to guard them with #ifndef
because it's quite common for them to be defined in other headers.
https://bugs.freedesktop.org/show_bug.cgi?id=28845
Signed-off-by: Brian Paul <brianp@vmware.com>
The target supports OpenVG on Windows with software rasterizer. The
egl_g3d_loader defined by the target supports arbitrary client APIs and
window systems. It is the SConscript that limits the support to OpenVG
and GDI.
This commit also fixes a typo in gdi backend.
Patch from Brad Smith <brad@comstyle.com>
The attached patch allows the GL_OES_query_matrix function to use the
systems fpclassify() for OpenBSD and NetBSD.
This undoes part of commit 8be645d53a
and fixes fd.o bug 28822 as well as other regressions.
The 'draw' module may issue additional state-change commands while
we're inside the draw_arrays/elements() call so it's important to
check for updated state at this point.
All scalar, vector, and matrix constructors are generated in-line
during AST-to-HIR translation. There is no longer any need to
generate function versions of the constructors.
Now that all scalar, vector, and matrix constructors are emitted
in-line, the parameters to these constructors should not be flattened
to a pile of scalars. Instead, the functions that emit the in-line
constructor bodies can directly write the parameters to the correct
locations in the objects being constructed.
This doesn't do any control flow analysis to ensure that the return
statements are actually reached.
Fixes piglit tests function5.frag and function-07.vert.
This change makes st/egl build a single egl_gallium.so and multiple
st_<API>.so and pipe_<HW>.so. When a display is initialized, the
corresponding pipe driver will be loaded. When a context is created,
the corresponding state tracker will be loaded.
Unlike DRI drivers, no ABI compatibility is maintained. egl_gallium,
pipe drivers and state trackers should always be distributed as a single
package. As such, there is only a single src/gallium/targets/egl/ that
builds everything for the package.
Several changes are made. libegl.a no longer defines _eglMain. It
defines functions to create and destroy a _EGLDriver instead. The
creation function is called by the targets. It takes an egl_g3d_loader
as its argument. The loader is defined by the targets and is in charge
of creating st_api and pipe_screen. This allows us to move the module
loading code to targets. Lastly, the modules are now loaded as the
respective contexts are created.
Merge all targets into targets/egl/. The target produces
egl_gallium_<HW>.so for each pipe driver and st_<API>.so for each client
APIs. This enables us to further merge egl_gallium_<HW>.so into
egl_gallium.so later.
Merge multiple egl_<platform>_<pipe>.so into a single
egl_gallium_<pipe>.so. The environment variable EGL_PLATFORM is now
used to modify the return value of _eglGetNativePlatform.
Move native_get_name, native_create_probe, native_get_probe_result, and
native_create_display into struct native_platform, and add
native_get_platform to get a handle to the struct.
The callback is used by st/vega to check if a visual specifies the
depth/stencil format. It forces st/vega to be loaded by st/egl to
perform the check. As noted in EGL spec, the depth/stencil format of a
visual should not affect OpenVG. It should be better to ignore the
field and always allocate the depth/stencil texture.
We're accessing in terms of columns, so we need to do MUL/MAD/MAD/MAD
instead of DP4s.
Fixes:
glsl-fs-exp2
glsl-fs-log2
glsl-fs-mix-constant
glsl-fs-sqrt-zero
glsl-vs-sqrt-zero
We were starting a scene whenever lp_setup_get_vertex_info() was called by
the draw module. So when when all primitives were culled/clipped, not only
did we create a new scene for nothing, but we end up using the old scene
with the old framebuffer state instead of a new one.
Fix consists in:
- don't call lp_setup_update_state() in lp_setup_get_vertex_info() -- no
longer necessary
- always setting the scene state before binning a command -- query
commands were bypassing it
- assert no old scene is reused in lp_setup_bind_framebuffer()
Now the question is whether we are allowed to ignore gl_rasterization_rules and
pipe_rasterizer_state::multisample. The former is invariant anyway and
I think the latter would need re-emitting the AA state which is quite costly,
considering that it implicitly flushes the whole pipeline (all AA regs
in the AA state are *unpipelined*).
This should be resolved with linker.cpp's location assignment, as
currently we drop that location assignment on the ground. However,
this gets basic programs using uniforms working for now.
we used to create and cache unltimited number of variant, this
change limits the number of variants kept around to a fixed number.
the change is based on a similar patch by Roland for llvmpipe fragment
shaders.
The compiler is now called by the driver, and generates program
instructions. Parameter lists are still not set up, so the driver
chokes on it shortly thereafter.
...instead of waiting until glGetString(GL_EXTENSIONS) is called.
This fixes a problem where the MESA_EXTENSION_OVERRIDE env var is
ignored if the app never calls glGetString(GL_EXTENSIONS).
NOTE: this is a candidate patch for the 7.8 branch.
This brings in the standalone GLSL compiler that we are planning on
replacing the existing Mesa GLSL compiler. It currently targets GLSL
1.20 and the Mesa IR.
This brings in the ir_to_mesa.cpp code I've been developing to codegen
to the Mesa IR. It does not actually generate a complete Mesa
fragment/vertex program yet.
At 1kloc, it doesn't look like I'll want to split the ir_to_mesa file
up even once it's feature-complete. Move definitions closer to usage,
and prevent rebuilding the world when changing the definitions.
The promise of the BURG was to recognize multi-instruction sequences
and emit reduced sequences for them. It would have worked well for
recognizing MUL+ADD -> MAD and possibly even MIN(MAX(val, 0), 1) ->
MOV_SAT with some grammar changes. However, that potential benefit in
making those optimizations easy is outweighed by the fragility of
monoburg, the amount of (incorrect, as I wrote it) code for using it,
and the burden it was going to cause for handling operations on
aggregate types.
The alloced_vec4/vec4 distinction was an experiment to expose the cost
of temps to the codegen. But the problem is that the temporary
production rule gets called after the emit rule that was using the
temp. We could have the args to emit_op be pointers to where the temp
would get allocated later, but that seems overly hard while just
trying to bring this thing up. Besides, the temps used in expressions
bear only the vaguest relation to how many temps will be used after
register allocation.
Ideally this would be hooked up by ir_print_visitor dumping into a
string that we could include as prog_instruction->Comment when in
debug mode, and not try keeping ir_instruction trees around after
conversion to Mesa. The ir_print_visitor isn't set up to do that for
us today.
There are major missing pieces here. Most operations aren't
supported. Matrices need to be broken down to vector ops before we
get here. Scalar operations (RSQ, RCP) are handled incorrectly.
Arrays and structures are not even considered.
I'm not sure if I really got it right. This seems like one of those
"Duh, of course it works that way" things, but I'd like the
documentation to be readable by people not acquainted with OGL/D3D.
Only for first RT at the moment, as there is no trivial way in galahad
to look at framebuffer state and (sadly) people don't usually calloc
their CSOs, so flags could be wrongly set.
On the other hand, of course, galahad will hopefully encourage more
people to calloc their CSOs. :3
1. Move all GL entrypoint functions and files into src/mesa/main/
This includes the ARB vp/vp, NV vp/fp, ATI fragshader and GLSL bits
that were in src/mesa/shader/
2. Move src/mesa/shader/slang/ to src/mesa/slang/ to reduce the tree depth
3. Rename src/mesa/shader/ to src/mesa/program/ since all the
remaining files are concerned with GPU programs.
4. Misc code refactoring. In particular, I got rid of most of the
GLSL-related ctx->Driver hook functions. None of the drivers used
them.
Conflicts:
src/mesa/drivers/dri/i965/brw_context.c
And hook it up at the two sites it's called.
Note that with this change we still don't use glsl_type* objects as
talloc contexts, (see things like get_array_instance that accept both
a talloc 'ctx' as well as a glsl_type*). The reason for this is that
the code is still using many instance of glsl_type objects not created
with new.
This closes 3 leaks in the glsl-orangebook-ch06-bump.frag test:
total heap usage: 55,623 allocs, 55,618
Leaving only 5 leaks to go.
Add a talloc ctx to both get_array_instance and the glsl_type
constructor in order to be able to call talloc_size instead of
malloc.
This fix now makes glsl-orangebook-ch06-bump.frag 99.99% leak free:
total heap usage: 55,623 allocs, 55,615
Only 8 missing frees now.
Simply call talloc_strdup rather than strdup, (using the talloc_parent
of our 'state' object, (known here as yyextra).
This fix now makes glsl-orangebook-ch06-bump.frag 99.97% leak free:
total heap usage: 55,623 allocs, 55,609 frees
Only 14 missing frees now.
Could have just added a call to free() to main, but since we're using
talloc everywhere else, we might as well just use it here too. So pass
a new 'ctx' argument to load_text_file.
This removes a single memory leak from all invocations of the
standalone glsl compiler.
Easily done now that s_expression is allocated with talloc. Simply
switch from new to talloc_strdup and the job is done.
This closes the great majority (11263) of the remaining leaks in the
glsl-orangebook-ch06-bump.frag test:
total heap usage: 55,623 allocs, 55,546 frees
(was 44,283 frees)
This test is now 99.86% leak-free.
By propagating a 'ctx' parameter through these calls.
This fix happens to have no impact on glsl-orangebook-ch06-bump.frag,
(since it doesn't trigger any errors).
By simply propagating a 'ctx' parameter through these function
calls. (We do this because these function are otherwise only receiving
an exec_list, which is not a valid talloc context.)
This closes 1611 leaks in the glsl-orangebook-ch06-bump.frag test:
total heap usage: 55,623 allocs, 44,283 frees
(was 42,672 frees)
And fix all callers to use the tallbac-based new for exec_node
construction. We make ready use of talloc_parent in order to get
valid, (and appropriate) talloc owners for everything we construct
without having to add new 'ctx' parameters up and down all the call
trees.
This closes the majority of the memory leaks in the
glsl-orangebook-ch06-bump.frag test:
total heap usage: 55,623 allocs, 42,672 frees
(was 14,533 frees)
Now 76.7% leak-free. Woo-hoo!
And use the talloc-based new for all of the ast objects created by the
parser. This closes a lot of memory leaks, and will allow us to use
these ast objects as talloc parents in the future, (for things like
exec_nodes, etc.).
This closes 164 leaks in the glsl-orangebook-ch06-bump.frag test:
total heap usage: 55,623 allocs, 14,553 frees
(was 14,389 frees)
Two of these destructors are non-empty, (s_symbol and s_list), so this
commit could potentially introduce memory leaks, (though, no additional
leaks are found in glsl-orangebook-ch06-bump.frag at least---perhaps
the current code is never calling delete on these classes?).
Going forward, we will switch to talloc for exec_node so we won't need
explicit destrcutors to free up any memory used.
We take advantage of overloading of the new operator (with an
additional parameter!) to make this look as "C++ like" as possible.
This closes 507 memory leaks when compiling glsl-orangebook-ch06-bump.frag
when measured with:
valgrind ./glsl glsl-orangebook-ch06-bump.frag
as seen here:
total heap usage: 55,623 allocs, 14,389 frees
(was 13,882 frees before)
This is a short-lived object. It exists only for the duration of the
compile_shader() function, (as opposed to the shader and whole_program
which live longer).
The state is created with the same talloc parent as the shader, so
that other allocation can be done with talloc_parent(state) as the
owner in order to attach to a long-lived object.
My current reading of the relevant static functions suggests that last
is never used without being uninitialized, (we only use it if the
expansion function returned non-NULL and the expansion functions
always set it before returning non-NULL).
Apparently gcc isn't coming to the same conclusion. Initializing this
to NULL nicely quites gcc and will guarantee a nice, early segfault if
my anaylsis turns out to be wrong.
This block of code is useless because a (nearly-equivalent) assignment
is made immediately after. The only difference is the omission of
-Wunreadchable-code in the assignment being used. Presumably, that was
intended to be -Wunreachable-code (without the first 'd'), but since
this hasn't been being used we just drop it.
This prevents the two code paths from getting out of sync. Also, future
work will need the shader source as a string anyway.
Unfortunately, this copies and pastes load_text_file from main.cpp, with
small changes (support for reading from stdin, talloc).
We were previously calculating a value which was either the geometry
shader output primitive or the application's input primitive, and
passing that to the various front/middle/back components for use as
the ultimate rendering primtive.
Unfortunately, this was not correct -- if the vcache decomposition
path is active and geometry shaders are *not* active, we can end up
with a third primitive -- specifically the decomposed version of the
input primitive.
Rather than trying to precalculate this, just let the individual
components inform their successors about which primitive type they are
recieving.
Any elt may potentially have flags bits set so mask off those bits
everywhere.
Fixes crashes with demos/gamma.c, redbook/polys.c, etc. but polygon
stippling is still broken.
The extension defines eglGetDRMDisplay that creates an EGLDisplay from a
DRM fd. Calling eglCreateWindowSurace or eglCreatePixmapSurface with
such displays will generate EGL_BAD_NATIVE_WINDOW or
EGL_BAD_NATIVE_PIXMAP.
This commit introduces type-safe platform displays internally. A
platform display consists of a generic pointer and an enum that
specifies the platform.
An EGLDisplay is created from a platform display. Native displays
become platform displays whose platform is determined by
_eglGetNativePlatform(). Platform windows and pixmaps may also be
introduced if needed.
Galahad is a sanity-checking layer meant to replace the crufty and
scattered sanity checks inside drivers with a robust, non-silenceable,
useful set of warnings and errors that can be used to keep misbehaving
state trackers from going unnoticed.
This is a follow-on to commit 80dfec3e53.
The valid attachments for glGetFramebufferAttachmentParameteriv() depends
on whether we're querying the default FBO or a user-created FBO.
This avoids spamming each file with includes of ir_print_visitor.h
because someone was doing debugging at some point, and is less typing
when doing debugging.
allows application to not only request the frequency of the TIME_ELAPSED
clock but also to detect if that frequency was consistent throughout the
entire bracketed range of graphics commands.
We had to call strlen on the preprocessed source, which seemed a bit
pointless; also, we updated shader->SourceLen but not shader->Source,
which was even more confusing. Just leave both untouched.
It's now done in the grammar, and as a result, can easily handle
parenthesis. defined ( identifier ) is now supported.
Fixes glcpp/tests/065-if-defined-parens.c.
Calling exit() would be really bad once integrated into mesa. Even in
the standalone binary, we want to print the error log first.
Since each case already flags an error, compilation will still fail,
but it may go on (with something fudged) and generate more errors.
By using a single function, the main compiler doesn't need to include
glcpp.h, which currently has a lot of details about the preprocessor
internals. In particular, this prevents the two yacc grammars from
seeing each other, which would be rather messy to sort out.
This reverts commit a9ee956511.
It was based on a failure to understand how ther driver allocates
memory, and causes a regression with Celestia.
Set MaxLevel to dstLevel before allocating new mipmap level.
The radeon driver will fail to allocate space for a new level that
is outside of BaseLevel..MaxLevel. Set MaxLevel before allocating.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
This should fix the FDO bug #28612.
Also, these piglit tests have been fixed:
- fbo-copypix
- scissor-copypixels
- copytexsubimage
- texredefine
Finally, 2 flushes in the transfer path are no longer needed.
like normal temporaries, but allows to define a number of distinct
arrays, all of which make it explicit that they contain /indexable/
registers.
as a side-effect we're adding support for multi-dimensional destination
registers.
The whole thing looks like this:
DCL TEMPX[0][0..128] # 0 array with 128 registers
ADD TEMPX[0][0], IN[0], IMM[0]
ADD TEMPX[0][1], IN[0], IMM[0]
ABS OUT[0], TEMPX[0][TEMP[0]]
allows one to specify a safe (bound checked) array
filled with immediates. it works just like a const
array and declares much like our current immediates.
llvmpipe can create a large number of shader variants for a single shader
(which are quite big), and they were only ever deleted if the shader itself
was deleted. This is especially apparent in things like glean
blendFunc where a new variant is created for every different subtest, chewing
up all memory.
This change limits the numbers of fragment shader variants (for all shaders)
which are kept around to a fixed number. If that would be exceeded a fixed
portion of the cached variants is deleted (since without tracking the used
variants this involves flushing we don't want to delete only one).
Always the least recently used variants (from all shaders together) are
deleted.
For now this is all per-context.
Both the number of how many variants are cached (1024) as well as how many
will be deleted at once (1/4 of the cache size) are just rough guesses and
subject to further optimization.
Early Z test (=ZTOP) must be disabled before a query is started,
otherwise the GPU is dead. The order of emitted registers matters more
than you might think.
This fixes hardlocks in sauerbraten.
When building OSMesa and xlib GL, the resulting OSMesa would be linked
against libGL instead of the internal mesa libraries. However, when
building with -fvisibility=hidden, some of the internal functions used
in OSMesa could not be resolved through libGL.
Instead, always build OSMesa standalone without linking against libGL.
This has the advantage that OSMesa is always built the same way, but it
means that disk space is wasted when libGL is installed since both
libraries will contain the internal objects.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: Tom Fogal <tfogal@alumni.unh.edu>
If the default framebuffer is bound to <target>, then
<attachment> must be one of FRONT_LEFT, FRONT_RIGHT, BACK_LEFT,
BACK_RIGHT, AUXi, DEPTH_BUFFER, or STENCIL_BUFFER, identifying a
color buffer, the depth buffer, or the stencil buffer, and
<pname> may be FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE or
FRAMEBUFFER_ATTACHMENT_OBJECT_NAME.
as well as these <pname> values
FRAMEBUFFER_ATTACHMENT_RED_SIZE,
FRAMEBUFFER_ATTACHMENT_GREEN_SIZE,
FRAMEBUFFER_ATTACHMENT_BLUE_SIZE,
FRAMEBUFFER_ATTACHMENT_ALPHA_SIZE,
FRAMEBUFFER_ATTACHMENT_DEPTH_SIZE,
FRAMEBUFFER_ATTACHMENT_STENCIL_SIZE,
FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE, or
FRAMEBUFFER_ATTACHMENT_COLOR_ENCODING.
https://bugs.freedesktop.org/show_bug.cgi?id=28551
Keith came up with a new way of running the pipeline which involves passing
a few info structs around (for fetch, vertices and prims) and allows us
to correctly handle cases where we endup with multiple primitives generated
by the pipeline itself.
I have had a look at the libdrm sources and they just contain more or less
the same checking we do in macros, and begin_cs may realloc the CS buffer
if we overflow it, which never happens with r300g. So these are pretty
much useless.
There is a small but measurable performance increase by dropping the two
functions.
The previous implementation had issues with queries spanning over several
command streams as well as using a very large number of queries.
This fixes flickering in Enemy Territory: Quake Wars. The driver now renders
everything correctly in this game and the graphics is awesome.
The idea is to build a hardware command buffer for every CSO and memcpy
the buffer to a command stream at bind time (or dirty-state-emission time,
to be precise).
All of the places that had been using the (glsl_type *, void *)
constructor were actually passing an ir_constant_data for the
'void *'. The code can be greatly simplified by replacing this
constructor with a (glsl_type *, ir_constant_data *) constructor.
This should also help prevent one class of invalid uses of the old
constructor.
The type of the resulting constant must be the same as the type of the
original expression. The changes to the code require that the case
where an unhandled expression is received, and there really shouldn't
be any of these, must be an exit point.
All of the cases (e.g., arrays and structures) that were being
filtered by these tests were already filtered by the earlier
is_numeric and is_boolean tests.
This causes the following tests to pass:
glslparsertest/shaders/CorrectMatComma2.frag
One of the incorrect errors in glslparsertest/shaders/CorrectComma.frag
is also eliminated.
The loop emulation unrolls loops as may times as possbile while still
keeping the shader program below the maximum instruction limit. At this
point, there are no checks for constant conditionals. This is only enabled
for fragment shaders.
This makes the binding table code simpler, and is required for gen6,
which requires binding table addresses to be under 64k offset from the
surface state base addr.
No significant change in performance on firefox-talos-gfx.
It turns out that computing a 56 byte key to look up a 20-byte object
out of a hash table was some sort of a bad idea. Whoops.
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
after:
[ 0] gl firefox-talos-gfx 34.761 34.784 0.17% 5/6
This slightly reduces reduces cairo-gl firefox-talos-gfx runtime on my
Ironlake:
before:
[ # ] backend test min(s) median(s) stddev. count
[ 0] gl firefox-talos-gfx 38.236 38.383 0.43% 5/6
after:
[ 0] gl firefox-talos-gfx 37.799 38.203 0.39% 6/6
It turns out the cost of caching these objects and looking them up in
the cache again is greater than the cost of just computing the object
again, particularly when the overhead of having a separate BO to pin
is removed.
(Those that are paying close attention will note that this is a
reversal of the path I was moving the driver in a couple of years ago.
The major thing that has changed is that back then all state was
recomputed when we wrapped the streaming state buffer, including
recompiling our precious programs. Now, we're uncaching just the
objects that are cheap to compute, and retaining caching of expensive
objects)
The cache lookup of these two little floats was .12% of total CPU time
on firefox-talos-gfx because we did it any time commonly-changed state
changed. On the other hand, updating the CC VP bo immediately whenver
CC VP state changes is a .07% overhead due to putting a driver hoook
in glEnable().
The KMS backend requires a hardware pipe driver. Do not build
egl_kms_swrast. Also, only build egl_fbdev_swrast for fbdev backend.
It is a pure software backend.
It looks like we were reading a fractional value, multiplying by an
enormous negative value, then stuffing that value into a bitfield
assuming it was already clamped. This becomes relevant for GL_ALPHA
or R/RG FBOs.
This avoids many pipeline stalls in cairo-gl.
[ # ] backend test min(s) median(s) stddev. count
Before:
[ 0] gl firefox-talos-gfx 36.799 36.851 2.34% 3/3
[ 0] gl firefox-talos-svg 33.429 35.360 3.46% 3/3
After:
[ 0] gl firefox-talos-gfx 35.895 36.250 0.48% 3/3
[ 0] gl firefox-talos-svg 26.669 29.888 5.34% 3/3
This doesn't avoid all the pipeline stalls because the kernel reports
!busy for buffers on the flushing list. That should be fixed in .36.
In exchange we end up with an extra memcpy, but that seems better than
calloc/free. Each buffer is 4k maximum, and on the i965-streaming
branch this allocation was showing up as the top entry in
brw_validate_state profiling for cairo-gl.
I was told this wouldn't help to fix the FDO bug #28443, but still,
it's a harmless last resort.
Also, linear textures safely fallback to an unpipelined transfer here.
Since _mesa_glsl_initialize_types add types for various extensions, we
can't call it until after processing "#extension foo : disable" lines.
Fixes tex_rect_02.frag.
This was detected by valgrind. I think GCC still does the right
thing, but the C++ spec allows the compiler to do something
stupid... like crash or only delete the first entry in the array.
lots and lots of fixes for geometry shaders. in particular now we work when the gs
emits a different primitive than the one the pipeline was started with and also
we work when gs emits more vertices than would fit in the original buffer.
If no customizer is present, 3D will be enabled by default.
Otherwise the option will default to the customizer value.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
There were entries to this function (most imporantly, prepare_render
-> update_renderbuffers) that wouldn't have had NEW_BUFFERS set, but
brw_wm_surface_state (the i965 state tracking the drawing regions)
expected this to change.
interface wise we have everything needed by d3d10 and gl transform feedback.
the draw module misses implementation of some corner cases (e.g. when stream
output wants different number of components per output than normal rendering
paths)
Okay I think this is good enough for now, I can't see any other reason
for mesa to want to use a sampler view so lets just leave it at all the A->X conversions for now.
I've been running gnome-shell under r300g with this for day or so and it seems fine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Every platform that supports GLSL sets GL_MAX_TEXTURE_COORDS to at
least 4, so hard-code 4 for now.
This causes the following tests to pass:
glslparsertest/glsl2/norsetto-bumptbn_sh_fp.vert
glslparsertest/glsl2/xreal-lighting-d-omni.vert
glslparsertest/glsl2/xreal-lighting-db-omni.vert
glslparsertest/glsl2/xreal-lighting-dbs-omni.vert
This causes the following tests to pass:
glslparsertest/glsl2/precision-02.vert
glslparsertest/glsl2/precision-04.vert
glslparsertest/glsl2/precision-06.vert
This causes the following test to fail. This shader was previously
failing to compile, but it was failing for the wrong reasons.
glslparsertest/glsl2/precision-03.vert
Some valid shaders, such as 'precision highp float;', evaluate to
empty sets of instructions. This causes some of the optimization
stages to enter infinite loops. Instead, don't bother processing the
empty ones.
this doesn't really look terribly useful for drivers to use, but until
drivers use their own implementation provide this since some state trackers
really want to use these functions.
The EGL state tracker is really weird in how it does software,
in the past we would just not return a drm_api struct but now,
there is no callback to get a function so we just set the
create_screen hock to NULL to make it switch to software.
This interfacre replaces the drm_api api it works very much the same
way as drm_api but with the exception that its meant for the target
to implement it. And it does not export a get function and neither a
destroy function.
The tmp_row storage allocation took into account the format's y block
size by allocating y_step rows of data. However, the x block size was
not being taken into account when deciding how wide those rows need to
be.
Now make sure that tmp_row is at least x_step by y_step in size.
This extension is implemented in the texenv program.
Gallium drivers pass patched glean/texCombine.
(I am going to send the patch soon)
Catalyst9.3 advertises this extension too so I don't see a reason we shouldn't.
Split instruction into scalar in core compiler this simplify
the way we translate the instruction in latter stage.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
The drawing rectangle is given in *inclusive* pixel values, so the range
is only [0,2047]. Hence when rendering to a 2048 wide target, such as an
extended desktop, we would issue an illegal instruction zeroing the draw
area.
Fixes:
Bug 27408: Primary and Secondary display blanks in extended
desktop mode with Compiz enabled
https://bugs.freedesktop.org/show_bug.cgi?id=27408
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This passes on r300g, the only bit I'm not really sure about is the handling
of the sampler_view in st_atom_texture.c, I unreference it there if the swizzle
value changes and I also have to create a new set of functions to create a new
one since the u_sampler.c ones don't handle swizzle so much.
adds r300g + softpipe enables, I think other drivers could pass easily enough.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The support for XRGB8888 appeared in the 855 and 865, and this format
is reserved on 830/845. This should fix a regression from
b4a6169412 that caused hangs in etracer
on 845s.
Bug #26557.
Rather than using the (munged) output of "gcc -E" we now capture
precisely the output we expect from every test case. This allows us to
stay immune from strange output from gcc (unpredictable whitespace
output---aprticularly with different gcc versions).
This will also allow us to write tests that capture expected error
messages from the preprocessor as well.
We had to remove this earlier because our recursive function calls
caused the same nodes to be examined for expansion more than once.
And in the test suite, one node would be examined before it had
its closing parenthesis and then again later after the parenthesis
was added.
So we removed this error message to allow the test case to pass.
Now that we've removed the unnecessary recursive function call
we can catch this error case and report it as desired.
Previously, both _expand_node and _expand_function would always make
mutually recursive calls into _expand_token_list. This was unnecessary
since these functions can simply return unexpanded results, after which
the outer iteration will next attempt expansion of the results.
The only trick in doing this is to arrange so that the active list is
popped at the appropriate time. To do this, we add a new token_node_t
marker to the active stack. When pushing onto the active list, we set
marker to last->next, and when the marker is seen by the token list
iteration, we pop from the active stack.
ir_dereference_array::array is always an r-value. If the dereference
is of a varaible, that r-value will be an ir_dereference_variable.
This simplifies the code a bit.
In two places we look for an (optional) sequence of characters other
than "*" followed by a sequence of on or more "*". Using a name for
this (NON_STARS_THEN_STARS) seems to make it a bit easier to
understand.
Ken reminded me of a couple cases that I should be testing. These are
the non-nestedness of things that look like nested comments as well as
potentially tricky things like "/*/" and "/*/*/".
The (non) nested comment case was not working in the case of the
comment terminator with multiple '*' characters. We fix this by not
considering a '*' as the "non-slash" to terminate a sequence of '*'
characters within the comment. We also fix the final match of the
terminator to use '+' rather than '*' to require the presence of a
final '*' character in the comment terminator.
We were nicely constructing a new expression for the implicit type
conversion, but then checking that the previous types matched instead
of the new expression's type. Fixes errors in Regnum Online shaders.
This removes a bunch of gratuitous moving around of constant values
from constructors. Makes a shader ir I was looking at for structure
handling almost readable.
Similar to other situations where the visitor pattern doesn't fit, in
this case we need the pointer to the base instruction in the
instruction stream for where to insert any new instructions we
generate (not the instruction in the tree we're looking at). By
removing the code for setting the base_ir, flattened expressions would
end up, for example, before the function definition where they had appeared.
We support both single-line (//) and multi-line (/* ... */) comments
and add a test for this, (trying to stress the rules just a bit by
embedding one comment delimiter into a comment delimited with the
other style, etc.).
To keep the test suite passing we do now discard any output lines from
glcpp that consist only of spacing, (in addition to blank lines as
previously). We also discard any initial whitespace from gcc output.
In neither case should the absence or presence of this whitespace
affect correctness.
Previously we were avoiding printing within a skipped group, but we
were still evluating directives such as #define and #undef and still
emitting diagnostics for things such as macro calls with the wrong
number of arguments.
Add a test for this and fix it with a high-priority rule in the lexer
that consumes the skipped content.
Extend the check for indirect addressing of temp regs to include
input/output regs.
Fixes failure with piglit glsl-texcoord-array.shader_test test when using
SSE codegen.
For example, if the fragment shader reads gl_TexCoord[i] with a
dynamic index we need to set all the InputsRead bits for all
texcoords. We were already doing this for shader outputs.
Refactored the later code so inputs and outputs are handled with
similar code.
Fixes a swrast failure with piglit's glsl-texcoord-array.shader_test
Multiple item params are OK because we don't allow swizzles for them
(in case you do array access to hit their elements, for example). For
singles, though, using the swizzle can cut down on storage, we do want
to allow a swizzled use of another param.
Fixes OGLC texRect.c.
First interpolate the 4 quads upper left corners, then sub-interpolate
each quad pixel. Do the perspective divide once per quad.
Saves some muls and reciprocates. But doesn't seem to make a
noticeable improvement.
It make the code simpler and more compact, so commiting anyway.
Eliminates all this identical yet slightly different code to decide how
shader inputs should be interpolated.
As bonus, don't interpolate the position twice when it is listed in the
TGSI shader inputs.
The pixel transfer rules state that we must set alpha to 1.0 in this case
which we can't easily do with the blitter. We can do to passes: one that
sets the alpha to 0xff and one that copies the RGB bits or we can just
use the 3D engine. Neither approach seems worth it for this case.
The xorg state tracker gets two new options to let the user choose
whether to enable / disable dirty throttling and swapbuffer throttling.
The default value of these options are enabled, unless the winsys
supplies a customizer with other values. The customizer record has been
extended to allow this, and also to set winsys-based throttling on a per-
context basis.
The vmware part of this patch disables the dirty throttling if the kernel
supports command submission throttling, and also in that case sets kernel
based throttling for everything but swapbuffers. The vmware winsys does not
set throttling per context, even if it theoretically could, but instead
sets throttling per screen. This should perhaps be changed, should the
xorg state tracker start to use multiple rendering contexts. Kernel throttling
is off by default for all new screens/contexts, so the dri state tracker
is not affected.
This significantly improves interactivity of the vmware xorg driver.
Cherry-picked from commit a8f3b3f88a
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
The winsys may need to extract the svga_winsys_context from a
pipe_context. Add a function to enable that functionality.
Cherry-picked from commit e8a8c5e339
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
It was not used anywhere; the code was buggy (it didn't take care of
indirect registers and could potential cause buffer underflows) and the
same effect can now be easily achieved by just by looking at
input_semantic_name[] and input_usage_mask[].
TGSI's UsageMask flag is never set. We can move this logic into
tgsi_ureg, but there there are still cases where's not used, so this
seems a better place for now.
Commit 5d0e136eff exposed a long-standing
bug in the glGetUniform*() code paths. We weren't properly decoding
the location parameter.
Fixes fd.o bug/regression 28344
Note: this patch should go into the 7.8 branch after the above-mentioned
commit.
Due to a quantization error, different cliprects of scaled video windows may
not have identical x / y scale.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This adds TFP support to the swrast driver, with this I can run gnome-shell inside Xephyr slowly. I've no idea why I did it, and g-s has other rendering issues under swrast, but it might be useful to hook up llvmpipe later. I've no idea if I even want to commit it at this point.
An enhanced version might just pass the pointer in the indirect rendering case
and avoid the memcpy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes an uninitialised value use in the dri2 st when doing TFP.
It uses the driContextPriv which isn't initialised at alloc time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The take-2 branch started over with a new grammar based directly on
the grammar from the C99 specification. It doesn't try to capture
things like balanced sets of parentheses for macro arguments in the
grammar. Instead, it merely captures things as token lists and then
performs operations like parsing arguments and expanding macros on
those lists.
We merge it here since it's currently behaving better, (passing the
entire test suite). But the code base has proven quite fragile
really. Several of the recently added test cases required additional
special cases in the take-2 branch while working trivially on master.
So this merge point may be useful in the future, since we might have a
cleaner code base by coming back to the state before this merge and
fixing it, rather than accepting all the fragile
imperative/list-munging code from the take-2 branch.
The 071-punctuator test is failing only trivially (whitespace change only).
And the 072-token-pasting-same-line.c test passes just fine here, (more
evidence perhaps that the approach in take-2 is more trouble than it's
worth?).
The 099-c99-example test case is the inspiration for much of the rest
of the test suite. It amazingly passes on the take-2 branch, but
doesn't pass here yet.
The list replacement when token pasting was broken, (failing to
properly update the list's tail pointer). Also, memory management when
pasting was broken, (modifying the original token's string which would
cause problems with multiple calls to a macro which pasted a literal
string). We didn't catch this with previous tests because they only
pasted argument values.
Previously '=' was not included in our PUNCTUATION regeular expression,
but it *was* excldued from our OTHER regular expression, so we were
getting the default (and hamful) lex action of just printing it.
The test we add here is named "punctuator" with the idea that we can
extend it as needed for other punctuator testing.
This isn't a problem here, but on the take-2 branch, it was trickier
at one point to make a non-macro work when the last token of the file.
So we use the simpler test case here and defer the other case until
later.
We take the results of macro expansion and splice them into the
original token list over which we are iterating. This makes it easy
for function-like macro invocations to find their arguments since they
are simply subsequent tokens on the list.
This fixes the recently-introduced regressions (tests 55 and 56) and
also passes new tests 60 and 61 introduced to strees this feature,
(with macro-argument parentheses split between a macro value and the
textual input).
clears were a bit limited in gallium:
- no scissoring (OGL only) nor explicit rectangle list (d3d9)
- no color/stencil masks (OGL only)
- no separate depth/stencil clears (d3d9/d3d10/OGL)
- cannot really clear single color buffer (only with resource_fill_region)
Additionally, d3d can clear surfaces not currently bound to the framebuffer.
It is, however, not easy to find some common ground what a clear should be able
to do, due to both API requirements and also hw differences (a case which might
be able to use a special clear path on one hw might need a "normal" quad render
on another).
Hence several clear methods are provided, and a driver should implement all of
them.
- clear: slightly modified to also be able to clear only depth or stencil in a
combined depth/stencil surface. This is however optional based on driver
capability though ideally it wouldn't be optional. AFAIK this is in fact
something used by applications quite a bit.
Otherwise, for now still doesn't allow clearing with scissors/mask (or single
color buffers)
- clearRT: clears a single (potentially unbound) color surface. This was formerly
roughly known as resource_fill_region. mesa st will not currently use this,
though potentially would be useful for GL ClearBuffer.
- clearDS: similar to above except for depth stencil surfaces.
Note that clearDS/clearRT currently handle can handle partial clear. This might
change however.
This lets Mesa work like other OpenGL implementations with regard
to indexing uniform arrays. See comments for details.
Note: this is a candidate for the 7.8 branch.
We previously had a confusing thing where _expand_token_onto would
return a non-zero value to indicate that the caller should then call
_expand_function_onto. It's much cleaner for _expand_token_onto to
just do what's needed and call the necessary function.
This behavior was useful when starting the implementation over
("take-2") where the whole test suite was failing. This made it easy
to focus on one test at a time and get each working.
More recently, we got the whole suite working, so we don't need this
feature anymore. And in the previous commit, we regressed a couple of
tests, so it's nice to be able to see all the failures with a single
run of the suite.
Recently I'm seeing cases where "gcc -E" mysteriously omits blank
lines, (even though it prints the blank lines in other very similar
cases). Rather than trying to decipher and imitate this, just get rid
of the blank lines.
This approach with sed to kill the lines before the diff is better
than "diff -B" since when there is an actual difference, the presence
of blank lines won't make the diff harder to read.
This test was tricky to make pass in the take-2 branch. It ends up
passing already here with no additional effort, (since we are lexing
integers as string-valued token except when in the ST_IF state in the
lexer anyway).
To do this correctly, we change the lexer to lex integers as string values,
(new token type of INTEGER_STRING), and only convert to integer values when
evaluating an expression value.
Add a new test case for this, (which does pass now).
Disable rendering to avoid GPU lockup.
Use radeondb to debug shader compiler :
radeondb -c gallium.bof
radeondb -s gallium.json
Will print shader generated, best is to use fp demos to test
the compiler.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
- enabled flushing a buffer more than once
- enabled the blitter for r600_clear
- added some more colors to r600_is_format_supported (copied from r600_conv_pipe_format)
- r600_set_framebuffer_state now sets rctx->fb_state
- more states are saved before a blit (had to add some accounting for the viewport and the vertex elements state)
- fixed a few errors with reference counting
Change the way we translate from c_compiler to the
asic specific representation. Should make things
simpler.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
- Wrapped the buffer and texture create/destroy/transfer/... functions
using u_resource, which is then used to implement the resource functions.
- Implemented texture transfers.
I left the buffer and texture transfers separate because one day we'll
need a special codepath for textures.
- Added index_bias to the draw_*elements functions.
- Removed nonexistent *REP and *FOR instructions.
- Some pipe formats have changed channel ordering, so I've removed/fixed
nonexistent ones.
- Added stubs for create/set/destroy sampler views.
- Added a naive implementation of vertex elements state (new CSO).
- Reworked {texture,buffer}_{from,to}_handle.
- Reorganized winsys files, removed dri,egl,python directories.
- Added a new build target dri-r600.
For this we always add a new argument to the argument list as soon as
possible, without waiting until we see some argument token. This does
mean we need to take some extra care when comparing the number of
arguments with the number of expected arguments. In addition to
matching numbers, we also support one (empty) argument when zero
arguments are expected.
Add a test case here for this, which does pass.
We want to check the incoming renderbuffer format, not the (potentially
non-existant) current attachment.
Fixes segfault w/ fbotexture -ds2.
NOTE: this will be applied to the 7.8 branch too.
These piglit tests have been fixed:
- bgra-sec-color-pointer
- glsl-routing
See comments at the beginning of r300_vs_draw.c
WPOS is implemented too but it doesn't work yet. I'm still working on it.
This just makes these functions easier to understand all around. In
the case of _token_list_append_list this is an actual bug fix, (where
append an empty list onto a non-empty list would previously scramble
the tail pointer of the original list).
That is, a function-like invocation foo(x) is valid as a
single-argument invocation even if 'x' is a macro that expands into a
value with a comma. Add a new COMMA_FINAL token type to handle this,
and add a test for this case, (which passes).
That is, the following case:
#define foo(x) (x)
#define bar
bar(baz)
which now works with this (ugly) commit.
I definitely want to come up with something cleaner than this.
This adds three new pieces of state to the parser, (is_control_line,
newline_as_space, and paren_count), and a large amount of messy
code. I'd definitely like to see a cleaner solution for this.
With this fix, the "define-func-extra-newlines" now passes so we put
it back to test #26 where it was originally (lately it has been known
as test #55).
Also, we tweak test 25 slightly. Previously this test was ending a
file function-like macro name that was not actually a macro (not
followed by a left parenthesis). As is, this fix was making that test
fail because the text_line production expects to see a terminating
NEWLINE, but that NEWLINE is now getting turned into a SPACE here.
This seems unlikely to be a problem in the wild, (function macros
being used in a non-macro sense seems rare enough---but more than
likely they won't happen at the end of a file). Still, we document
this shortcoming in the README.
Previously, the syntax was (array_ref <variable name> <index>), but the
subject is now a general rvalue (not a name). In particular, it might
be a (var_ref ...).
Also, remove "expected ... or (swiz)" from error messages; swiz is not
allowed inside a var_ref.
Array dereferences now point to variable dereferences instead of
pointing directly to variables. This necessitated some changes to the
way the variable is accessed when setting the maximum index array element.
Move the accept method for hierarchical visitors from ir_dereference
to the derived classes. This was mostly straight-forward, but I
suspect that ir_dead_code_local may be broken now.
Create separate subclasses of ir_dereference for variable, array, and
record dereferences. As a side effect, array and record dereferences
no longer point to ir_variable objects directly. Instead they each
point to an ir_dereference_variable object.
This is the first of several steps in the refactoring process. The
intention is that ir_dereference will eventually become an abstract
base class.
When generating or uploading a new (higher) mipmap level for an image,
we can need to allocate a miptree for a level greater than
texObj->MaxLevel.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This can happen when checking if a software fallback for a higher level
operation (such as GenerateMipmap) is needed.
Signed-off-by: Maciej Cencora <m.cencora@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
It's now more separate from the rest of the driver and it can be disabled
by commenting out just 1 line. Well, I couldn't make the previous version
work with SW TCL reliably, that's the reason of this little rework.
We could potentially do this on G45 as well, though the units are
different. On 965, the timestamp is tied to hclk, which would make
supporting it harder.
This is a workaround for Ironlake errata. The emit_mi_flush is used
for a few purposes:
1) Flushing write caches for RTT (including blit to texture)
2) Pipe fencing for sync objects
3) Spamming cache flushes to track down cache flush bugs
Spamming cache flushes seems less important than following the docs,
and we should probably do that with a different mechanism than the one
for render cache flushes.
To do this we have split the existing "HASH_IF expression" into two
productions:
First is HASH_IF pp_tokens which simply constructs a list of tokens.
Then, with that resulting token list, we first evaluate all DEFINED
operator tokens, then expand all macros, and finally start lexing from
the resulting token list. This brings us to the second production,
IF_EXPANDED expression
This final production works just like our previous "HASH_IF
expression", evaluating a constant integer expression.
The new test (54) added for this case now passes.
The es1 and es2 dispath table initialization code is generated from the
API XML files and we can't easily share the dispatch table code setup.
Keep the _mesa_init_shader_dispatch() part of the patch, but roll back
the static-ization of shader entrypoints so es1 and es2 dispatch
initialization still works.
This doesn't change any functionality here, but will allow us to make
future changes that were not possible with direct printing.
Specifically, we need to expand macros within macro arguments before
performing argument substitution. And *that* expansion cannot result
in immediate printing.
Supporting embedded newlines in a macro invocation is going to be
tricky with our current approach to lexing and parsing. Since this
isn't really an important feature for us, we can defer this until more
important things are resolved.
With this test out of the way, tests 27 through 31 are passing.
This trailing whitespace was coming from macro definitions and from
macro arguments. We fix this with a little extra state in the
token_list. It now remembers the last non-space token added, so that
these can be trimmed off just before printing the list.
With this fix test 23 now passes. Tests 24 and 25 are also passing,
but they probbably would ahve before this fix---just that they weren't
being run earlier.
We weren't including this left parenthesis in the argument's token
list so the nested function invocation wasn not being recognized.
With this fix, tests 21 and 22 now pass.
This causes test 16 to pass. Tests 17-20 are also passing now, (though
they would probably have passed before this change and simply weren't
being run yet).
This is what gcc does, and it's actually less work to do
this. Previously we were having to save the contents of space tokens
as a string, but we don't need to do that now.
We extend test #0 to exercise this feature here.
This makes test 15 pass and also dramatically simplifies the lexer.
We were previously using a CONTROL state in the lexer to only emit
SPACE tokens when on text lines. But that's not actually what we
want. We need SPACE tokens in the replacement lists as well. Instead
of a lexer state for this, we now simply set a "space_tokens" flag
whenever we start constructing a pp_tokens list and clear the flag
whenever we see a '#' introducing a directive.
Much cleaner this way.
For this we add an "active" string_list_t to the parser. This makes
the current expansion_list_t in the parser obsolete, but we don't
remove that yet.
With this change we can now start passing some actual tests, so we
turn on real testing in the test suite again. I expect to implement
things more or less in the same order as before, so the test suite now
halts on first error.
With this change the first 8 tests in the suite pass, (object-like
macros with chaining and recursion).
With this change, we can recreate the original text-line input
exactly. Previously we were inserting a space between every pair of
tokens so our output had a lot more whitespace than our input.
With this change, we can drop the "-b" option to diff and match the
input exactly.
This is a fresh start with a much simpler approach for the flex/bison
portions of the preprocessor. This isn't functional yet, (produces no
output), but can at least read all of our test cases without any parse
errors.
The grammar here is based on the grammar provided for the preprocessor
in the C99 specification.
Flush means flush, i.e., all previous operations should be visible from
other contexts.
This does not imply unswizzling tiles, since unswizzling should be done on
a needed basis for any context.
This fixes a double-free() error when not using a shared memory XImage.
The XDestroyImage() function frees the ximage->data buffer if non-NULL.
If we free it ourselves, we also need to NULL-out the pointer.
Convert Z from a normalized value in the range [0, 1] to an
object-space Z coordinate in [-1, +1] so that drawing at the new Z
position with the default/identity ortho projection results in the
original Z value. Used by the meta-Clear, Draw/CopyPixels and Bitmap
functions where the Z value comes from the clear value or raster
position.
Fixes piglit tests fdo23670-depth_test, quad-invariance and
glsl-orangebook-ch06-bump as well as oglc zbfunc.c.
https://bugs.freedesktop.org/show_bug.cgi?id=23670
The check was disabled when FEATURE_OES_framebuffer_object was enabled,
since that used to mean we weren't implementing regular OpenGL semantics.
Now that we can compile in support for multiple APIs, change the #ifdef to
compile the check in when FEATURE_GL is enabled and enable the check for
contexts that implement OpenGL at runtime.
In addition to the decimal literals which we already support. Note
that we use strtoll here to get the large-width integers demanded by
the specification.
This is what the C99 specification demands. And the GLSL specification
says that we should follow the "standard C++" rules for #if condition
expressions rather than the GLSL rules, (which only support a 32-bit
integer).
The operator coverage here is quite complete. The one big thing
missing is that we are not yet doing macro expansion in #if
lines. This makes the whole support fairly useless, so we plan to fix
that shortcoming right away.
So far the only expression implemented is a single integer literal,
but obviously that's easy to extend. Various things including nesting
are tested here.
Previously, we were using the same lexing stack as we use for macro
expansion to also expand macro arguments. Instead, we now do this
earlier by simply recursing over the macro-invocations replacement
list and constructing a new expanded list, (and pushing only *that*
onto the stack).
This is simpler, and also allows us to more easily implement token
pasting in the future.
The last remaining thing here was that when a line ended with a macro,
and the parser looked ahead to the newline token, the lexer was
printing that newline before the parser printed the expansion of the
macro.
The fix is simple, just make the lexer tell the parser that a newline
is needed, and the parser can wait until reducing a production to
print that newline.
With this, we now pass the entire test suite with simply "diff -u", so
we no longer have any diff options hiding whitespace bugs from
us. Hurrah!
This fixes more differences compared to "gcc -E" so removes several
cases of erroneously failing test cases. The implementation isn't very
elegant, but it is functional.
We fix this by moving printing up to the top-level "input" action and
tracking whether a space is needed between one token and the next.
This fixes all actual bugs in test-suite output, but does leave some
tests failing due to differences in the amount of whitespace produced,
(which aren't actual bugs per se).
This whitespace was not part of anything being tested, and it
introduces differences (that we don't actually care about) between the
output of "gcc -E" and glcpp.
Just eliminate this extra whitespace to reduce spurious test-case
failures.
Sometime back the output of glcpp started differing from the output of
"gcc -E" in the amount of whitespace in emitted. At the time, I
switched the test suite to use "diff -w" to ignore this. This was a
mistake since it ignores whitespace entirely. (I meant to use "diff
-b" which ignores only changes in the amount of whitespace.)
So bugs have since been introduced that the test suite doesn't
notice. For example, glcpp is producing "twotokens" where it should be
producing "two tokens".
Let's stop ignoring whitespace in the test suite, which currently
introduces lots of failures---some real and some spurious.
The fix here is quite simple (and actually only deletes code). When
expanding a macro, we don't return a ',' as a unique token type, but
simply let it fall through to the generic case.
The specification says that commas within a parenthesized group,
(that's not a function-like macro invocation), are passed through
literally and not considered argument separators in any outer macro
invocation.
Add support and a test for this case. This support makes a third
occurrence of the same "FUNC_MACRO (" shift/reduce conflict appear, so
expect that.
This change does introduce a fairly large copy/paste block in the
grammar which is unfortunate. Perhaps if I were more clever I'd find a
way to share the common pieces between argument and argument_or_comma.
The specification of the preprocessor in C99 says that when we see a
macro name that we are already expanding that we refuse to expand it
now, (which we've done for a while), but also that we refuse to ever
expand it later if seen in other contexts at which it would be
legitimate to expand.
We add a test case for that here, and fix it to work. The fix takes
advantage of a new token_t value for tokens and argument words along
with the recently added IDENTIFIER_FINALIZED token type which
instructs the parser to not even look for another expansion.
There's not yet any change in functionality here, (at least according
to the test suite). But we now have the option of specifying a type
for each string in the token list. This will allow us to finalize an
unexpanded macro name so that it won't be subjected to excess
expansion later.
Previously, we would pass original strings back to the original lexer
whenever we needed to re-lex something, (such as an expanded macro or
a macro argument). Now, we instead parse the macro or argument
originally to a string list, and then re-lex by simply returning each
string from this list in turn.
We do this in the recently added glcpp_parser_lex function that sits
on top of the lower-level glcpp_lex that only deals with text.
This doesn't change any behavior (at least according to the existing
test suite which all still passes) but it brings us much closer to
being able to "finalize" an unexpanded macro as required by the
specification.
We rename the generated lexer from yylex to glcpp_lex. Then we
implement our own yylex function in glcpp-parse.y that calls
glcpp_lex. This doesn't change the behavior at all yet, but gives us a
place where we can do implement alternate lexing in the future.
(We want this because instead of re-lexing from strings for macro
expansion, we want to lex from pre-parsed token lists. We need this so
that when we terminate recursion due to an already active macro
expansion, we can ensure that that symbol never gets expanded again
later.)
The support for an object-like amcro within a macro-invocation
argument was also implemented at one level too high in the
grammar. Fortunately, this is a very simple fix.
The previous fix added FUNC_MACRO to a production one higher in teh
grammar than it should have. So it prevented a FUNC_MACRO from
appearing as part of a mutli-token argument rather than just alone as
an argument. Fix this (and add a test).
This adds a second shift/reduce conflict to our grammar. It's basically the
same conflict we had previously, (deciding to shift a '(' after a FUNC_MACRO)
but this time in the "argument" context rather than the "content" context.
It would be nice to not have these, but I think they are unavoidable
(withotu a lot of pain at least) given the preprocessor specification.
This case worked previously, but broke in the recent rewrite of
function- like macro expansion. The recursion was still terminated
correctly, but any parenthesized expression after the macro name was
still being swallowed even though the identifier was not being
expanded as a macro.
The fix is to notice earlier that the identifier is an
already-expanding macro. We let the lexer know this through the
classify_token function so that an already-expanding macro is lexed as
an identifier, not a FUNC_MACRO.
The rewrite her discards the functions that did direct, recursive
expansion of macro values. Instead, the parser now pushes the macro
definition string over to a stack of buffers for the lexer. This way,
macro expansion gets access to all parsing machinery.
This isn't a small change, but the result is simpler than before (I
think). It passes the entire test suite, including the four tests
added with the previous commit that were failing before.
Many of these look quite similar to existing tests that are handled
correctly, yet none of these work. For example, in test 30 we have a
simple non-function macro "foo" that is defined as "bar(baz(success))"
and obviously non-function macro expansion has been working for a long
time. Similarly, if we had text of "bar(baz(success))" it would be
expanded correctly as well.
But when this otherwise functioning text appears as the body of a
macro, things don't work at all.
This is pointing out a fundamental problem with the current
approach. The current code does a recursive expansion of a macro
definition, but this doesn't involve the parsing machinery, so it
can't actually handle things like an arbitrary nesting of parentheses.
The fix will require the parser to stuff macro values back into the
lexer to get at all of the existing machinery when expanding macros.
The test has a newline before the left parenthesis, and newlines to
separate the parentheses from the argument.
The fix involves more state in the lexer to only return a NEWLINE
token when termniating a directive. This is very similar to our
previous fix with extra lexer state to only return the SPACE token
when it would be significant for the parser.
With this change, the exact number and positioning of newlines in the
output is now different compared to "gcc -E" so we add a -B option to
diff when testing to ignore that.
The most recent fix to the parser introduced a shift/reduce
conflict. We document this conflict here, and tell bison that it need
not report it (since I verified that it's being resolved in the
direction desired).
For the record, I did write additional lexer code to eliminate this
conflict, but it was quite fragile, (would not accept a newline
between a function-like macro name and the left parenthesis, for
example).
This has the added advantage that it will stop traversing the tree as
soon as the first call is found.
The output of all test cases was verified to be the same using diff.
That is, when a function-like macro appears in the content without
parentheses it should be accepted and passed on through, (previously
the parser was regarding this as a syntax error).
The test case here is simply "#define foo foo" and "#define bar foo"
and then attempting to expand "bar".
Previously, our termination condition for the recursion was overly
simple---just looking for the single identifier that began the
expansion. We now fix this to maintain a stack of identifiers and
terminate when any one of them occurs in the replacement list.
The first bug was not allowing whitespace between '#' and the
directive name.
The second bug was swallowing a terminating newline along with any
trailing whitespace on a line.
With these two fixes, and the previous commit to stop emitting SPACE
tokens, the recently added extra-whitespace test now passes.
This reverts the unconditional return of SPACE tokens from the lexer
from commit 48b94da099 .
That commit seemed useful because it kept the lexer simpler, but the
presence of SPACE tokens is causing lots of extra complication for the
parser itself, (redundant productions other than whitespace
differences, several productions buggy in the case of extra
whitespace, etc.)
Of course, we'd prefer to never have any whitespace token, but that's
not possible with the need to distinguish between "#define foo()" and
"#define foo ()". So we'll accept a little bit of pain in the lexer,
(enough state to support this special-case token), in exchange for
keeping most of the parser blissffully ignorant of whether tokens are
separated by whitespace or not.
This change does mean that our output now differs from that of "gcc -E",
but only in whitespace. So we test with "diff -w now to ignore those
differences.
We were correctly parsing this already, but simply not returning any
value (for no good reason). Fortunately the fix is quite simple.
This makes the test added in the previous commit now pass.
The macro invocation is defined to consume all text between a set of
matched parentheses. We previously tested for inner parentheses from a
nested function-like macro invocation. Here we test for inner
parentheses occuring on their own, (not part of another macro
invocation).
We provide for this by changing the value of the argument-list
production from a list of strings (string_list_t) to a new
data-structure that holds a list of lists of strings
(argument_list_t).
Then we print the final string list up at the top-level content
production along with all other printing.
Additionally, having macro-expansion productions that create values
will make it easier to solve problems like composed function-like
macro invocations in the future.
Previously, printing was occurring all over the place. Here we
document that it should all be happening at the top-level content
production, and we move the printing of directive newlines.
The printing of expanded macros is still happening in lower-level
productions, but we plan to fix that soon.
Instead of "parameter_list" and "replacement_list" just use
"parameters" and "replacements". This is consistent with the existing
"arguments" and keeps the line length down in the face of the
now-longer "string_list_t" rather than "list_t".
It seems strange to always be returning SPACE tokens, but since we
were already needing to return a SPACE token in some cases, this
actually simplifies our lexer.
This also allows us to fix two whitespace-handling differences
compared to "gcc -E" so that now the recent modification to the test
suite passes once again.
This shows two minor failures in our current parsing (resulting in
whitespace-only changes, oso not that significant):
1. We are inserting extra whitespace between tokens not originally
separated by whitespace in the replacement list of a macro
definition.
2. We are swallowing whitespace separating tokens in the general
content.
Previously our parser was incorrectly treating this case as a
function-like macro. We fix this by conditionally passing a SPACE
token from the lexer, (but only immediately after the identifier
immediately after #define).
Our current parser sees "#define foo (" as an identifier token
followed by a '(' token and parses this as a function-like macro.
That would be correct for "#define foo(" but the preprocessor
specification treats this whitespace as significant here so this test
currently fails.
Previously, an empty argument could be parsed as either an "argument_list"
directly or first as an "argument" and then an "argument_list".
We fix this by removing the possibility of an empty "argument_list"
directly.
We accept the structure of arguments in both macro definition and
macro invocation, but we don't yet expand those arguments. This is
just enough code to pass the recently-added tests, but does not yet
provide any sort of useful function-like macro.
These test only the most basic aspect of parsing of function-like
macros. Specifically, none of the definitions of these function like
macros use the arguments of the function.
No function-like macros are implemented yet, so all of these fail for
now.
This is just a minor style improvement for now. But the same
mechanism, (having the lexer peek into the table of defined macros),
will be essential when we add function-like macros in addition to the
current object-like macros.
Previously we had two copies of all top-level actions, (once in a list
context and once in a non-list context). Much simpler to instead have
a single list-context production with no action and then only have the
actions in their own non-list contexts.
We are able to remove all state by simply passing NEWLINE through
as a token unconditionally (as opposed to only passing newline when
on a driective line as we did previously).
This isn't ideal for two reasons:
1. There's a bunch of stateful redundancy in the lexer that should be
cleaned up.
2. The hash table does not provide a mechanism to delete an entry, so
we waste memory to add a new NULL entry in front of the existing
entry with the same key.
But this does at least work, (it passes the recently added undef test
case).
The lexer was previously using strdup (expecting the parser to free),
but is now more consistent, easier to use, and slightly more efficent
by using talloc along with the parser.
Also, we add xtalloc and xtalloc_strdup wrappers around talloc and
talloc_strdup to put all of the out-of-memory-checking code in one
place.
We now store a list of tokens in our hash-table rather than a single
string. This lets us replace each macro in the value as necessary.
This code adds a link dependency on talloc which does exactly what we
want in terms of memory management for a parser.
The 3 tests added in the previous commit now pass.
These 3 new tests are modeled after 3 existing tests but made slightly
more complex since now instead of definining a new macro to be an
existing macro, we define it to be replaced with two tokens, (one a
literal, and one an existing macro).
These tests all fail currently because the replacement lookup is
currently happening on the basis of the entire replacement string
rather than on a list of tokens.
One with the chained defines in the opposite order, and one with the
potential to trigger an infinite-loop bug through mutual
recursion. Each of these tests pass already.
The fix is as simple as adding a loop to continue to lookup values
in the hash table until one of the following termination conditions:
1. The token we look up has no definition
2. We get back the original symbol we started with
This second termination condition prevents infinite iteration.
Validate desired test cases by ensuring the output of glcpp matches
the output of the gcc preprocessor, (ignoring any lines of the gcc
output beginning with '#').
Only one test case so far with a trivial #define.
Most of the current problems were (mostly) harmless things like
missing declarations, but there was at least one real error, (reversed
argument order for yyerrror).
This allows the final program to be 100% "valgrind clean", (freeing
all memory that it allocates). This will make it much easier to ensure
that any allocation that parser actions perform are also cleaned up.
The statement making up a loop body, a then-statement, or an
else-statement are single nodes. If the statement is a block, the
single node will be an ast_compound_statement. There is no need to
loop at the top level when processing these statements.
Previously the list of function call parameters was stored as a
circular list in ast_expression::subexpressions[1]. They are now
stored as a regular list in ast_expression::expressions.
This cleans up a bunch of junk code in some of the GLSL parser tests,
and could potentially help real-world too (particularly after copy
propagation has happened).
Debugging this took forever as I only looked at constructors in ir.cpp
to find who wasn't setting up ->type. I dislike hiding code (as
opposed to prototypes and definitions) in C++ header files, but in
this case I have only myself to blame.
This caused a nasty bug where the function inliner would create new
variables for each of the formal parameters, but the body would still
reference the old copies.
This was highly visible since the dead code eliminator (rightly) removed
the new declarations, leading to printed IR that referenced non-existent
variable names.
If you used all 4 color targets we currently support, we would see 0
and end up just writing the first output. Give enough bits that we
can do the maximum of 16.
Fixes piglit fbo-drawbuffers-maxtargets.
The remaining programs are ones I've had difficulty finding a build
environment for to make the build system or are unit tests that should
probably live next to their code instead. Hopefully people can bring
over the build for remaining pieces they care about.
The idea would be that you could have multiple send messages going on
if nothing depended on the previous message's results and you used a
different send message. The problem is that the later send requires
the VUE handle returned by the first send's allocate anyway.
The test for env['platform'] caused an exception since 'env' is not
defined at that point. Instead, determine the target platform by
scanning sys.argv[].
Before we would throttle in the flush callback prior to round-tripping
to the server to do copyregion or swapbuffer. Now, instead just note
that we need to throttle and do it in intel_prepare_render(), which
will be called after receiving the response from the server but before
we start rendering the next frame. Even if the server also throttles
us in swapbuffer, this just makes the throttling a no-op when we hit
intel_prepare_render(). With that we can drop the
using_dri2_swapbuffers hack and just always throttle.
prevents segfault when state trackers try to set default mask.
Other option would be to make this required only for drivers
supporting multisampling, but this seems more clean.
Only dummy implementations (for normal drivers) provided (no driver
supports multisampling yet neither).
compile tested only.
Should probably change the python surface_copy/fill functions
also into resource_copy/fill_region functions and adapt the code
using them.
The util blit functions change their interface (apart from some rename) too
(in particular util_blit_pixels now also takes a pipe_resource as the src blit
argument instead of a surface, as it might just call resource_copy_region).
Maybe the blit util code might need a bit more cleanup, it still doesn't feel
very clean. In particular it seems that util_blit_pixels_tex should probably
disappear, and I think it would be great if the code called by drivers for
blitting (u_blitter.c, which isn't really touched by this change) could somehow
be merged with the u_blit code.
Previously, surface_copy was said to allow overlapping blits, and it was
"optional". However, some state trackers actually assumed it is always present,
and quite some code (like in u_blit) assumed overlapping isn't allowed.
Hence, resource_copy_region (and in the same spirit, resource_fill_region) is
now mandatory, but overlapping blits are no longer allowed. A driver can plug
in the cpu fallback util_resource_copy_region if it does not want to provide its
own implementation, though this is not optimal.
due to popular request, use nr_samples parameter in is_format_supported()
instead of new is_msaa_supported() query.
This makes it easily possible to query if a format with a given sample count
is also supported not only as render target, but for sampler views (note that
texture sampling from multisampled resources isn't supported yet).
It is not quite how dx10 format msaa queries work, but we might need to revisit
format queries completely in the future anyway.
Use front/back instead of cw/ccw throughout.
Also, use offset_point/line/fill instead of offset_cw/ccw.
Brings gallium representation of this state into line with its main
user, and also what turns out to be the most common hardware
representation.
This fixes a long-standing bias in the interface towards the
architecture of the software rasterizer.
This extension adds a new function which provides an alternative to
eglSwapBuffers. eglSwapBuffersRegionNOK accepts two new parameters in
addition to those in eglSwapBuffers. The new parameters consist of a
pointer to a list of 4-integer blocks defining rectangles (x, y,
width, height) and an integer specifying the number of rectangles in
the list.
When there is no user driver or any matching display drivers we fall
back to the default driver. This patch lets us have a list of default
drivers instead of just one. The drivers are loaded in turn and we
attempt to initialize the display. If it fails we unload the driver
and move on to the next one.
Compared to the display driver mechanism, this avoids loading a number
of drivers and then only using one. Also, we call Initialize to see
if the driver will work instead of relying on Probe. To know for sure
that a driver will work, Probe really have to do a full Initialize, so
we will just use Initialize directly.
This is similar to the GL_QUAD_STRIP -> TRIANGLE_STRIP optimization --
the GS usage to split the quads into tris is a huge bottleneck, so a
quick check improves glean blendFunc time massively (width * height of
the window of single-pixel GL_QUADS, many many times). This may also
end up helping with cairo performance, which sometimes ends up drawing
a single quad.
It produces exactly the same machine code, but it cuts 5% of the
number of instructions generated for a typical shader.
Also, preserve the scalar when length is 1.
Also added calls to the create function in target helpers and in
tr_drm.c the latter being a hack and should be replaced with the
wrap screen target helper. But at least this way we don't regress.
This removes references to symbols in draw module for OpenGL ES build.
As OpenGL ES does not support feedback/selection mode, draw module is
used in pathes that will never be reached. However, if the symbols are
referenced, it will bloat the final shared libraries unnecessarily.
This is serious when LLVM is enabled.
With the omit list gone, there are not too many differences in building
core mesa and ES overlay. Remove the mesa/es and build both of them in
src/mesa/Makefile.
This allows atifragshader.h to be used without knowing if
FEATURE_ATI_fragment_shader is enabled. As a result, atifragshader.c is
removed from the omit list in ES overlay.
Remove sources that are feature-aware from the omit list. x86 -O0 build
is ~12KiB smaller afther making those sources feature-aware.
Also, remove get.c from the omit list as get_es[12].c have been merged
to it.
Make st_cb_feedback.h FEATURE_feedback aware and st_cb_rastpos.h
FEATURE_rastpos aware. Move creation of selection/feedback draw context
to st_init_draw.
This allows transformfeedback.h and st_cb_xformfb.h to be included and
used without knowing if FEATURE_EXT_transform_feedback is enabled. Fix
build of ES overlay.
We were allocating too much memory for linear layouts. The block_size
factor is already included in the row_stride and should not be used in
the img_stride calculation. This is typically a 4x savings!
If debug build, keep a linked list of all allocated resources (textures).
The llvmipe_print_resources() function can be called from a debugger to
print a list of all resources, their sizes, total size, etc.
When we have DRI2 protocol at least 2.3, we get an event from the
server when the back buffers get invalidated. When that's the case
let the driver know that it can rely on invalidate instead of the
glViewport polling.
The presence of this extension indicates to the DRI driver that the
loader will call invalidate in the __DRI2_FLUSH extension, whenever
the needs to query for new buffers. This means that the DRI driver
can drop the polling in glViewport().
Now that transfers are context operations it is the driver's
responsibility to ensure that transfers happen in order with all other
context operations, so flushes and finishes inside Mesa should be no
longer necessary. The attached patch implements that.
This should proportionate significant improvements for hardware drivers
which are able to stream transfers in the command buffers.
You can use the softpipe/llvmpipe_flush_resource() as reference
implementation of the worst case scenario, where the driver is not able
to streamline transfers. But the expectation is that driver
implementators will want to avoid flushing as much as possible.
Without this patch, any old intel_flush() call will cause a round trip to
the server and do a copy from fake to real front. We only actually
guarantee that frontbuffer results show up when glFlush() ia called, so
move the flushing to intel_glFlush().
We also need to flush fake to front before getting new buffers, but
we just handle that manually.
When we call intel_prepare_render() from intelReadPixels(), we'll mark
the front buffer dirty. That's silly, since we're only reading from it
and marking it dirty will cause us to copy from fake front to front
eventually.
Just clear the dirty flag after doing the read.
Now that we have intel_prepare_render() in place, we can use it to mark
the front buffer dirty if we're rendering to the front buffer once we
get there.
We only need this when the server may have swapped the buffers or
when we receive an invalidate event from the server. The default
behaviour is still that the DRI driver will invalidate its own buffers
when glViewport is called.
https://bugs.freedesktop.org/show_bug.cgi?id=27277
Seeing very weird crashes during std::cout initialization.
The fault probably lies in the way I build LLVM on MSVC, but disable for
now to allow more time to investigate.
When internal_format and tex->format differ, st_finailize_texture will
surface_copy between surfaces with different formats. This commit works
around the issue by ignoring internal_format. A sane fix is needed
here.
In particular, don't use the clamped lod to compute level + 1, or
lod in [-1, 0] range will actually interpolate with level 1.
This makes Mipfilter DCT pass 100%.
Also start axing the code duplication for scalar case. The olution is to
treat the scalar case specially in a few innermost functions, and leave
outer functions untouched.
It's quite a pain to remember the details after a while, and it is quite
likely we'll want to use this again, either for different polynomial
orders or different functions, so commit it here.
The advantage of range[-0.5, 0.5] is that it doesn't require floor (for
which intrinsics are only available in SSE4.1).
But the EXP opcode pretty much forces us to use floor, and there is a
good floor approximation around truncation available anyway.
This fixes EXP failures in VShader DCT.
This reduces redundant code by moving:
- CS space reservation
- buffer validation
- dirty state emission
- index bias emission
- AOS emission
into r300_prepare_for_rendering.
The cont_mask must be restored and exec mask recomputed in order to decide
whether to repeat the loop or not.
Unlike the continue mask, the break_mask must be preserved across loop
iterations.
Fixes several VShader DCT cases, and no regressions with glean.
Mainly for the recent work on mapi and gles. The latter adds some
sources to src/mesa/main that are generated on the fly. This makes
python a requirement for building Mesa. An alternative is to package
those sources into the tarballs. That may be done in another commit.
The new target installs client API modules to EGL_DRIVER_INSTALL_DIR.
They are used by st/egl.
The client APIs are built from OpenGL and OpenVG state trackers. For
this to work, st/vega is modified to produce a static library,
libvega.a, instead. st/es is also not needed any more. It is removed
and --with-state-trackers=es is replaced by --enable-gles-overlay.
As st/egl now has its own client API modules, this solves the ABI issue
between st/egl and client APIs, as long as the client API modules are
distributed with st/egl. Plus, this allows st/egl to support OpenGL
with non-Gallium libGL.so.
Define <API>_LIB, <API>_LIB_NAME, <API>_LIB_GLOB, and some other
variables in the configs. Fix a typo in glesv1_cm.pc.in where an
inexistent variable is used.
And lookup the GLX screen for the context. Otherwise we'll end up
jumping through a NULL-pointer once we try to look up the visual
or config for the shared context.
https://bugs.freedesktop.org/show_bug.cgi?id=14245
Rename vgFooBar to vegaFooBar and use vgapi as the dispatcher. This
makes sure there is always a current context when the internal functions
are called. And eglGetProcAddress is finally supported.
Specifically, move all or most of
glapi/glapi.c to mapi/u_current.c,
glapi/glapi_execmem.c to mapi/u_execmem.c,
glapi/glthread.[ch] to mapi/u_thread.[ch]
and remove their dependencies on core Mesa headers.
Note that this also requires X86 for llvm, if llvmpipe/draw_llvm works
on PPC then the condition should be extended to include && x86.
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
If a shader was bound to the fragment shader TGSI executor and it was
then deleted and a new shader was allocated at the same address as the
old shader, the new fragment shader would not get properly bound to
the TGSI machine and we'd wind up using the old one.
This would not have been a problem if shaders were refcounted.
Now the TGSI machine is owned by the context rather than the quad
pipeline's shader stage so that the softpipe_delete_fs_state()
function can access it.
Fixes sporadic failures of the piglit fp-long-alu test (fd.o bug 27989).
To properly execute an instruction such as "ADD tmp, tmp.wzyx, foo;"
with SOA we (sometimes) need to put the results into temporaries before
writing the results to the destination register.
This patch fixes the ADD instruction but this needs to be done for
many more instructions.
Helps to fix piglit fp-long-alu test (fd.o bug 27989).
This lets us unbind a shader from the tgsi_exec_machine. Since
shaders aren't ref counted we need this to properly clean up when
deleting shaders elsewhere.
When the mipmap level is smaller than the compression block width, height
we need to fill in / replicate pixels so that we don't get garbage values.
Fixes piglit gen-compressed-teximage test.
Saves some work and avoids potential issue with inconsistant mipmap
level sizes. As long as the mipmap levels from BaseLevel to MaxLevel
are consistant, we don't care about the other levels.
OpenGL occlusion queries work now. The Mesa demos, glean test and piglit
tests all pass. A few enhancements are possible in the future. -Brian
Signed-off-by: Brian Paul <brianp@vmware.com>
These files are built with make and removed with make clean, so it does not
seem necessary to track them.
Looking at the Makefile, it seems that the two u_indices_* files are handled
similarly to u_format_srgb.c u_format_table.c and u_half.c, and these 3
files are already untracked and in .gitignore
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
These files are built with make and removed with make clean, so it does not
seem necessary to track them.
Looking at the Makefile, it seems that the two u_indices_* files are handled
similarly to u_format_srgb.c u_format_table.c and u_half.c, and these 3
files are already untracked and in .gitignore
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
These files are built with make and removed with make clean, so it does not
seem necessary to track them.
Looking at the Makefile, it seems that the two u_indices_* files are handled
similarly to u_format_srgb.c u_format_table.c and u_half.c, and these 3
files are already untracked and in .gitignore
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Builds on commit ddb0e18f6c and fixes
regressions in glean clipFlat test.
We assume that Gallium drivers observe flatshade_first for all triangles
and that all the assorted per-triangle calls in the 'draw' module also
follow flatshade_first. Everything else builds on those rules.
Gallium does not use follow flatshade_first for GL quads, quad strips
and polygons; the "last" vertex is always the provoking vertex for those
prims. So now there are separate QUAD_FIRST_PV and QUAD_LAST_PV macros
in the draw primitive decomposition code instead of one QUAD macro.
This prevents memory usage explosion in blender due to the state cache
hanging on to old fake frontbuffer regions. Sigh at blender still
using frontbuffer rendering.
Bug #24119.
Remember the size of the level=0 mipmap image. Do not call
util_format_get_component_bits when st_context_teximage is called to
release a texture image.
ES overlay is built with FEATURE_ES1 or FEATURE_ES2, and is built
without FEATURE_GL. Fix the build by always building OpenGL ES sources,
but test for FEATURE_ES1 or FEATURE_ES2. Also, define symbols that are
missing because FEATURE_GL is not defined.
When the cube faces were stored in a compressed format, the img_stride
values were wrong and didn't match the per-face size computed in the
tex_image_face_size() function. This caused bad rendering or segfaults.
Before we looked at stObj->pt to see if we may have run out of memory,
but that's not a good indicator. This fixes the spurious GL_OUT_OF_MEMORY
errors that could arise before.
In the lack of more fine grained capabilities in Gallium, assume that if
the pipe driver supports GLSL then native limits match Mesa software
limits.
(cherry picked from commit 40a90cd11234a09c2477f5c9984dd6d9fac3f52c)
The slang_variable::declared field originated as a debug field but
can be promoted for use during sematic error checking.
Fixes fd.o bug 27921.
NOTE: this is a candidate for back-porting to the 7.8 stable branch.
This reverts commit 9446fd8f69.
It doesn't make sense to replace strcpy(a,b) with strncpy(a,b,strlen(b)).
The preceeding code effectively does bounds checking, btw.
Commit e648d4a1d1 changed the original
less-than test to a not-equal test. This was an effort to save some
memory by switching the texture layout to a non-mipmapped layout when
we mis-guessed about the original layout (thus saving some memory).
However, this causes us to hit a new (apparently broken) code path
when copying the old texture's data to the new texture. Simply
undo this change for the time being until the other/new bug is fixed.
Fixes fd.o bug 27933.
This gives a ~30% shader optimization time improvement on blender.
Tested by comparing the dumped LLVM modules.
Current ordering:
time ~/llvm-git/obj/Release-Asserts/bin/opt l.bc -constprop -instcombine
-mem2reg -gvn -simplifycfg
real 0m1.126s
user 0m1.108s
sys 0m0.012s
With this patch:
time ~/llvm-git/obj/Release-Asserts/bin/opt l.bc -mem2reg -constprop -instcombine -gvn -simplifycfg
real 0m0.885s
user 0m0.880s
sys 0m0.000s
The overall improvement in blender is ~15%.
Blender without the patch takes 1m13s:
edwin 5934 87.6 11.5 729440 458296 pts/5 SLl+ 17:35 1:13 blender
Blender with the patch takes 1m3s:
edwin 5726 94.2 11.2 716424 446168 pts/5 SLl+ 17:32 1:03 blender
It is still slow with the patch, but better (most of the optimization time is
taken up by GVN, see LLVM PR7023).
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Current code only invalidated if the texture was different, however we
store swizzled values in the cache, so we need to invalidate in that case
also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we were resetting the mask on each new break/continue statement within
the same scope. we always need to and the current execution mask
with the current break/continue mask to get the correct result (the
masks are always ~1 initially)
Even though swrast defines its own __DriverAPIRec it still shares the
driCreateNewContext() implementation from dri_util.c. So the CreateContext
prototypes have to match in the two __DriverAPIRecs.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
These two should be tied together because what's set in VAP or stuffed in GA
should be rasterized in RS. Not doing so causes a hardlock.
The reason for the merge is that if stuffed texture coordinates (e.g. point
sprite texgen) happen to occupy the texcoord slot dedicated to fog or wpos,
the two must be relocated to other free slots, which needs remapping the vertex
shader outputs.
The rasterizer code is now literally a sequence read-rasterize-write.
If we're using a wrap mode in which border color sampling is possible
it means that texcoords may be outside of the texture image bounds.
Fetching the texel may result in a segfault.
Use the 'use_border' variable to catch such texcoords and replace
the texel offset with zero (which will be in bounds).
Fixes segfault in Lightsmark demo, fd.o bug 27877.
Most of the failure from using uninlined function calls ends up being
just bad rendering, but nested function calls in the VS currently hang
the GPU, so reject them and explain why.
In no particular order:
* Make list const
* Add function comments
* Clearly state that demo lists are not complete
* Fix whitespace
* Use __FUNCTION__ instead of __func__
* Add unimplemented check which always fail
Thanks Brian and Keith.
We were doubling up the offsets for the mipmap levels for CPU access.
Instead of reimplementing i945_miptree_layout_2d with 6 cube images
separated by qpitch, share that function and provide the level offsets
later.
Fixes piglit cubemap and fbo-cubemap.
The backwards compatibility code calls the DRI driver invalidate hook
on swap buffer and flush front buffer. This lets the DRI driver rely
on invalidate callbacks and drop the glViewport() hack, even if the
server doesn't send invalidate events. This is essentially a revert
of 2d00d16da7, except that we now also
pass the __DRI_USE_INVALIDATE extension even when the server doesn't
have DRI2 invalidate events.
This basically restores the previous state, where a vertex result slot
is set up for the texcoord to be replaced with point coord. Fixes
piglit point-sprite test.
Bug #27625
When linking with --as-needed libgallium.a can't find the dl* symbols from
-ldl since order matters more with --as-needed.
Thanks to Nirbheek Chauhan and Adam Jackson
This is trying to follow the spirit of the invariance rules, though
they're not specific on this point. Fixes quad-invariance piglit test
while retaining the 22s -> 18s win on glean blendFunc.
This was a regression in c67d9d84f5.
This extension allows a color buffer to be used for both rendering and
texturing. EGL allows the use of color buffers of pbuffer drawables
for texturing, this extension extends this to allow the use of color
buffers of pixmaps too.
The RGBX version isn't supported as a vertex input type, but since we
force the last channel's value anyway, this should be fine. The only
potential risk I see is in the limiter on VBO reads past the end of
the buffer forcing the whole vertex to 0 when the A channel lands past
the end.
Fixes piglit draw-vertices-half-float.
Note that we don't support arbitrary block size for compressed quite
yet -- block height of 4 is hard-coded all over the place.
Bug #27098 (srgb dxt1 producing a bytes per pixel of 0).
Check for the GLX_ARB_fbconfig_float and GLX_NV_float_buffer extensions
to determine if color bufs are floating point.
Check for the GLX_EXT_framebuffer_sRGB extension to determine if the
framebuffer is sRGB capable.
Increase field size for some attribs (visual ID and buffer size) to
accomodate today's larger values.
Also print missing caveats info in verbose mode.
Ideally scons should be able to work backwards from the list of
targets to figure out which drivers, state trackers and other
convenience libraries need to be built.
Since there is no track of which names are structure names during parsing,
TYPE_NAME cannot be produced by the lexer. Use IDENTIFIER and let the AST
processor sort it out.
All of these are currently emitted as part of the IR, so by initializing
them, we actually end up with two copies. For constructors, we may
eventually wish to avoid emitting them as part of the IR output.
This is necessary so _mesa_glsl_initialize_types can create appropriate
glsl_types and add them to the symbol table.
In the future, we'll want to set it to the max GLSL version supported by
the current driver.
Unfortunately, we still have two kinds of matching - one, with implicit
conversions (for use in calls) and another without them (for finding a
prototype to overwrite when processing a function body). This commit
does not attempt to coalesce the two.
It might be better to simply handle "true" in the reader, but since
booleans normally aren't printed as "true" or "false", we may as well go
for consistency.
A new static version takes an ir_expression_operation enum, and the
original non-static version now uses it. This will make it easier to
read operations (where the ir_expression doesn't yet exist).
This can be useful for debugging - it allows us to see that the inferred
type is what we think it should be. Furthermore, it will allow the IR
reader to avoid complex, operator-specific type inference.
add function to set sample mask, and state for alpha-to-coverage and
alpha-to-one. Also make it possible to query for supported sample count
with is_msaa_supported().
Use explicit resource_resolve() to resolve a resource. Note that it is illegal
to bind a unresolved resource as a sampler view, must be resolved first (as per
d3d10 and OGL APIs, binding unresolved resource would mean that special texture
fetch functions need to be used which give explicit control over what samples
to fetch, which isn't supported yet).
Also change surface_fill() and surface_copy() to operate directly on resources.
Blits should operate directly on resources, most often state trackers just used
get_tex_surface() then did a blit. Note this also means the blit bind flags are
gone, if a driver implements this functionality it is expected to handle it for
all resources having depth_stencil/render_target/sampler_view bind flags (might
even require it for all bind flags?).
Might want to introduce quality levels for MSAA later.
Might need to revisit this for hw which does instant resolve.
max_index must be observed to prevent crashes due to bad index data.
I've been using this patch for some time without regressions.
Some places, where we use internal vertex buffer, it is not entirely
clear what max_index should be, so passing just ~0 to avoid regressions
for now.
doc additions: shader export ARRAY_BASE for EXPORT_POS: 60 is position,
61 is misc vec(VS_OUT_MISC_VEC - used here),
62, 63 are clip distance vectors(VS_OUT_CCDIST#)
sorry for formating - there seem to be so many different styles in r600
Certain headers, such as GL/glew.h, are in both the Mesa include and the
default installed include directories. On recent distros the needed
symbols can be found in both places. On older distros the installed
headers could be lacking symbols, so for a header that exists in both
places, the local one should be found first.
The POSIX function pthread_cond_wait can have spurious wakeups when
waiting on a condition variable.
Add a 64-bit counter that is incremented whenever the barrier becomes
full. A woken thread checks the counter. If the counter has not changed
then it has been spuriously woken and goes back to sleep. If the counter
has changed then it was properly signaled and exits the barrier.
Tested on Mac OS X.
This patch was based on ideas from Luca Barbieri.
The struct st_module isn't needed as it is the same thing as the st_api
struct. That is they both represent the API. Instead just use a single
function entry point to the the API.
Everybody should respect max_index, specially llvm generated code, which
likes to eat vertices 4 at a time, so it may end up chew a bit a bit more
than actually exists.
Transfer, being now a context operation, should happen in order with
all other contexts operations. If there is rendering pending on the
resource then the driver must flush and potentially wait itself
internally.
Instead of avoiding using transfers internally (as done in llvmpipe) I've
opted to simply pass PIPE_TRANSFER_UNSYNCHRONIZED in all internal
transfers, to avoid infinite recursion.
bo_legacy->tobj cannot be NULL before the call to driUpdateTextureLRU.
There is a NULL check earlier in the routine, and if bo_legacy->tobj is
NULL, memory is allocated.
info cannot be NULL at the call to debug_printf. emit_instruction
dereferences info, so at debug_printf it is either not NULL or the
program has already crashed.
va_list is a mutable iterator. When passed to a function it will likely
point to somewhere else.
This fixes segmentation fault in glean vertProg1 on Ubuntu 9.10.
allows the swrastg_dri.so to be built with llvmpipe, also links llvm
to all dri drivers
use --enable-gallium-llvm to use it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Among other things, this ensures that all of the extension flags are
initially disabled.
This causes the following tests to pass:
glslparsertest/glsl2/draw_buffers-02.frag
The missing else-statements caused all of the extensions execpt
GL_ARB_texture_rectangle to be unsupported.
This causes the following tests to pass:
glslparsertest/glsl2/draw_buffers-04.frag
Try to specify render target bindings flags first. If that fails, try
again with just sampler view binding. Note that we try to create the
texture resource with render target binding flags later when we allocate
the texture. Then, in FBO validation, we check if we can actually render
to the textures. If that fails, we generate GL_FRAMEBUFFER_UNSUPPORTED_EXT.
Changes suggested by Jose.
__glXInitializeVisualConfigFromTags doesn't skip the payload of
unrecognized tags. Instead, it treats the value as if it were the
next tag, which can happen if the server's GLX extension is not
Mesa's. For example, this falls down when NVIDIA sends a
GLX_FLOAT_COMPONENTS_NV = 0 pair, causing
__glXInitializeVisualConfigFromTags to bail out early.
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
After continuously running regression tests on Ubuntu for 2 days, shmget
mysteriously starts to fail. Even when the X server is reset.
This allow rendering to proceed, albeit using a slower presentation path.
If there is no depth buffer bound to current context don't
enable depth test. GL states that if depth test is enabled
without depth buffer it's as if depth buffer always pass.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
We would run into trouble due to the hardware using inclusive numbers
and the subtraction to handle that producing negative (meaning large
positive) coordinates.
Bug #27643.
a bit more involved than indirect addressing over consts, but still
fairly reasonable. we allocate an array instead of individual alloca's,
and we do it only if the shader does indirect addressing.
implement indirect addressing (ARL and ARR instructions) when used
with CONST's. indirect addressing over other vars (temps, inputs, outputs)
is not supported yet.
Put PIPE_BIND_ and PIPE_TEXTURE_GEOM_ prefixes on token names so
that they can be found with grep. This needs to be done in more places.
Corrected/improved a lot of information and grammer.
I don't know how to properly format everything - someone else can take
care of that.
We now allocate the table from api_exec.c and dlist.c where we fill out
the table. This way, context.c doesn't need to know the actual contents
of struct _glapi_table.
Otherwise slightly difference order causes assertion failures.
Also remove mentions of PIPE_BIND_SCANOUT/PIPE_BIND_SHARED. They are not
propoer bind flags and will likely be deprecated. If surfaces should
be passed to the winsys then they should have the DISPLAY_TARGET flag
set, which is a proper bind flag.
It was missing PIPE_BIND_RENDER_TARGET, causing assertion failures for
pure render targets.
Also bind flags are too variable and complex for a good assessment for
whether the resource is a texture or not. Target is more concise.
The mesa state tracker is currently the only place where we create a
context and expect it to implement GLES1/2. Use the API-aware constructor
to communicate this to core mesa.
This introduces a new way to create or initialize a context:
_mesa_create_context_for_api and
_mesa_initialize_context_for_api
which in addition to the current arguments take an api enum to indicate
which OpenGL API the context should implement. At this point the
API field in GLcontext isn't used anywhere, but later commits will
key certain functionality off of it.
The _mesa_create_context and _mesa_initialize_context functions are
kept in place as wrappers around the *_for_api versions, passing in
API_OPENGL to get the same behavior as before.
Previously, the body of some vector constructors were added to the wrong
function signature, and the body of matrix constructors were just being
dumped in the main instruction stream.
Now, ir_function is emitted as part of the IR instructions, rather than
simply existing in the symbol table. Individual ir_function_signatures
are not emitted themselves, but only as part of ir_function.
As in tgsi_exec.c we don't actually rely on condition codes; we do
an unconditional kill. The only predication comes from the execution
mask which applies inside loops/conditionals.
New draw API function to indicate whether or not to convert points to
quads for sprite rasterization.
Fix point-to-quad conversion regression in the wide-point stage. We
need to check the pipe_rasterizer_state::point_quad_rasterization flag.
Rename old IGDNG to Ironlake, and set 'gen' number for
Ironlake as 5, so tracking the features with generation num
instead of special is_ironlake flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Previously, when we created a gallium texture for a corresponding Mesa
texture we'd only allocate space for mipmap levels >= BaseLevel.
This patch undoes that mechanism. This fixes a render-to-texture bug
when rendering to level 0 when BaseLevel=1.
Also, it makes sense to allocate the whole texture object memory when
BaseLevel > 0 since a common use of GL_TEXTURE_BASE_LEVEL is to
progressively load/render mipmaps. Eventually, the app almost always
fills in the level=0 mipmap image.
Finally, the texture image code is bit easier to understand now.
If both Z-test and stencil-test were enabled, we were mis-computing
the vector of updated Z buffer values.
Fixes Z testing bug in progs/demos/fbotexture.c
Fixes hang with "gst-launch-0.10 videotestsrc ! video/x-raw-rgb !
glupload ! gleffects effect=heat ! glimagesink" which uses 2 samplers
pointing at GL_TEXTURE1 and GL_TEXTURE2, and piglit
glsl-fs-sampler-numbering.
As we are rebasing to min_index + elt_bias, and the vertex buffer has no
elt_bias.
I still don't know how to exercise this code. I hope this is now right.
Commit 88be2171e7 fixed the egl demos on the stable branch, but now
they're spread out across multiple subdirectories.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
eglplatform.h pulls in Xlib.h on X11 platforms. Likewise, the egl glx
driver and egl programs needs to link to libX11. Make sure we use the
locations the user told us about.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
The Xlib demos were fixed to use $(X11_LIBS) so that configure could
detect the proper directory to link the library from, but this broke
the non-autoconf builds. Give X11_LIBS a default value to fallback on.
(cherry picked from commit e40fce13e1)
The variable X_LIBS from AC_PATH_XTRA contains only the -L searchdir
parameter and not the -lX11 to link to Xlib. Use X11 prefixed build vars
for linking with Xlib to avoid the conflict.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
(cherry picked from commit e725ef171b)
This can break on systems that don't have a system X installation.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
(cherry picked from commit de4ee20578)
Conflicts:
src/gallium/winsys/xlib/Makefile
This fixes broken 3D texture indexing when the height of the 3D texture
was less than 64 (the tile size). It's simpler to pass this as an array
(as we do with the row stride) than to compute it on the fly.
Treat cube faces and 3D texture slices in the same manner (they're layed
out out continuously in memory). Additional clean-ups and improvements
coming.
This pass only works on assignments where the variable is never
referenced. There is no code flow analysis, so it can't do a better
job of avoiding redundant assignments.
For now, the optimizer only does do_dead_code_unlinked(), so it won't
trim the builtin variable list or initializers outside of the scope of
functions. This is because we don't have the visibility into other
functions that might get linked in in order to eliminate work on
global variables.
This patch amends the error output string for the case where the
dri2 egl driver could not open the dri dev node.
Signed-off-by: Brian Paul <brianp@vmware.com>
When the size of the scene (binned data plus referenced resources/textures)
exceeds LP_MAX_SCENE_SIZE flush/render the scene. This could be improved
in various ways but is a good start.
Fixes piglit streaming-texture-leak test.
The removal of _glapi_noop_enable_warnings and _glapi_set_warning_func
in e4f168a6f4 prevents DRI drivers built
before the commit from loading. Add stub versions of the functions to
make them load again.
These compressed format switch cases shouldn't be hit if we don't
support the compressed texture extensions, but let's be safe and
ask the driver if they're supported as we do in other cases.
Runtime linking doesn't quite work.
Just comment then out for now to prevent crashes. These will go away in
the future because calling 4 times CRT's cosf()/sinf() is over-precise
and under-performing.
In short what the code did before:
__DRIscreen *psp = NULL;
if (pcp)
psp = pcp->psb;
assert(psp);
if (psp->stuff)
other_stuff();
return psb->even_more(pcp);
Remove all that stupid checking which still segfaults/asserts later on and
just do what we do in driUnbindContext. Also limited testing show libGL never
call driUnbindContext or driBindContext with cPriv == NULL.
libdrm-2.4.20 and earlier include the nouveau/nouveau_class.h header. A
later version of libdrm will not ship this header. Mesa also has this
header at src/gallium/drivers.
The symbol NV34TCL_VTXFMT_TYPE_HALF is needed by nvfx_vbo.c. This symbol
is not in the libdrm copy of the header but is in the Mesa copy of the
header. This patch moves src/gallium/drivers to the beginning of the
include paths such that when building on hosts with libdrm-2.4.20 or
ealier the build uses the copy in Mesa.
we were only running the llvm paths when the input elts were linear,
now we can handle abritrary fetch elts arrays. we do this by generating
two paths - linear and fetch_elts one and just selecting the right one
at run time.
This will be important to optimization passes. We don't want to
dead-code eliminate writes to out varyings, or propagate uninitialized
values of uniforms.
In the direct rendered case, we need to tell the server our initial swap
interval. If we don't, the local and server values will be out of sync,
since the server and client defaults may be different (as they were
before this patch).
Fixes failed assertion from fd.o bug 27713.
The assertion was added with the new resource/transfer changes.
This patch could apply to the 7.8 branch but it's not essential.
This should be drmCommandWriteRead to avoid an EINVAL error on systems
that strictly check ioctl args. This command has been r/w for ever.
Discussion with airlied agreed that this was the correct course.
Signed-off-by: Brian Paul <brianp@vmware.com>
When points or lines are decomposed into triangles, we need to be sure
to disable polygon culling, stippling, "un-filled" modes, etc.
This patch sets the rasterization state to disable those things prior to
drawing points/lines with triangles, then restores the previous state
afterward.
The new piglit point-no-line-cull test checks this problem & solution.
indexBias corresponds to:
- BaseVertexIndex parameter of D3D9's
IDirect3DDevice9::DrawIndexedPrimitive method
- BaseVertexLocation parameter of ID3D10Device::DrawIndexed
Although a positive indexBias can be easily be implemented in Gallium by
adding indexBias*stride to each vertex buffer base offset, a negative
indexBias cannot, as the final vertex buffer offset could be negative.
I'm not aware of this functionality being exposed to GL drivers, so for
now all hardware drivers will just assert(indexBias == 0).
See also:
- http://msdn.microsoft.com/en-us/library/bb174369.aspx (D3D9)
- http://msdn.microsoft.com/en-us/library/ff556126.aspx (D3D10 DDI)
In TGSI, front facing is +1 and back-facing is -1. We were computing
this attribute as +1 and 0 before. However, the value isn't actually
used anywhere because we machine->Face attribute overrides it in
tgsi_exec.c. That could be changed, removing some special-case code...
Currently r300g does not save vertex buffer on blitter calls.
It gets away with it because the current Mesa state tracker usually
resets vertex buffers on every draw calls.
However, this is wrong.
nvfx won't be lucky because it needs to use the blitter inside draw
calls.
Just set the max index to 1, this lets doom3 run and seems correct,
though it would be better to just emit a constant like SVGA does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
A register file value is unsigned so could never be -1. A
value of 0 also aliased to PROGRAM_TEMPORARY.
If an OPCODE_PRINT has no registers to print, set the register
file value to PROGRAM_UNDEFINED and check for that value when
executing this instruction.
This fixes problems with the draw module's aaline, aapoint and
pstipple stages where we change some driver state after the call
to the draw module's draw_arrays() function.
Some of the draw pipeline stages emit additional vertex attributes.
Without this change, we were getting stale vertex info that didn't
include the extra attributes.
This branch implemented dual representations of texture/drawing surfaces:
one in the conventional linear layout and the other the tiled layout which
is used by the fragment shader pipe. Per-tile flags indicate the layout
of each image tile. In many situations this lets us avoid converting
image data between the two layouts.
Squashed commit of the following:
commit 563a7e3cc552fdcfcaf9ac0d4b1683c3ba2ae732
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 14:48:21 2010 -0600
llvmpipe: convert points/lines to triangles with draw module
This isn't the most efficient way to render points/lines but it allows us
to run more tests.
commit a8aa763e8a717533f2b13bb6ea53cbccbede68c9
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 14:47:28 2010 -0600
llvmpipe: call llvmpipe_get_texture_tile() for depth/stencil
The returned pointer isn't used, but the tile status/layout info
gets updated. Helps to fix glReadPixels(DEPTH / STENCIL).
commit 463bc64af266194acbea71cd52e26a79b8c8a260
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 10:58:48 2010 -0600
llvmpipe: add store_color to debug cmd_names list
commit 784cc73fb334a9d7b7c93cbd8a1445cdf742ff58
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 10:57:43 2010 -0600
llvmpipe: fix debug build
commit 792c93171ec075664f55720ffed397ac2834a4fc
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 10:49:01 2010 -0600
llvmpipe: fix cube mapping
commit 882b1035db88c3dd8aebe28dc971ac30a9ee39e3
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 09:53:30 2010 -0600
llvmpipe: remove some older/unused code
commit b807d32b23145301e8842824664d9f06b9c5502e
Author: Brian Paul <brianp@vmware.com>
Date: Thu Apr 8 09:29:50 2010 -0600
llvmpipe: silence warning
commit 7b337e64fec92836ccdf9d96216289dd58418e35
Author: Brian Paul <brianp@vmware.com>
Date: Wed Apr 7 17:06:08 2010 -0600
llvmpipe: clean-up, comments in lp_surface_copy()
commit c52fa36f249cc652fa8d5fdd94d6574127c08c41
Author: Brian Paul <brianp@vmware.com>
Date: Wed Apr 7 16:51:42 2010 -0600
llvmpipe: overhaul tiled/linear memory management
Now we keep per-tile layout info (linear vs. tiled (or neither or both)
and convert from one layout to the other on demand.
commit 4a50ccfd470547c9be0704005818a87014e9c0e9
Author: Brian Paul <brianp@vmware.com>
Date: Wed Apr 7 16:51:27 2010 -0600
llvmpipe: added tile read/write counters
commit b7d0ea9c687ac8773b083791623826fa604adf21
Author: Brian Paul <brianp@vmware.com>
Date: Mon Apr 5 14:54:04 2010 -0600
llvmpipe: rename some functions
commit ee45c6e5b95cbd3c8cccc9aa4d45d8aef11e20c4
Author: Brian Paul <brianp@vmware.com>
Date: Mon Apr 5 14:42:15 2010 -0600
llvmpipe: re-org some get block/tile pointer code
commit 26ce97c16c0b6520ff1538803baa772d8c3b1280
Author: Brian Paul <brianp@vmware.com>
Date: Mon Apr 5 14:34:13 2010 -0600
llvmpipe: disable bad assertions
commit 5c670481248c4d46f87f13bf3af5655925e7002d
Author: Brian Paul <brianp@vmware.com>
Date: Fri Apr 2 16:36:11 2010 -0600
llvmpipe: add a special-case optimization to lp_surface_copy()
Be more efficient when copying tiled image to linear image.
Before, the fallback path was always converting the whole source image
to linear. Now we can convert just a sub region.
commit faa684645e64d6024b3a11e4e08da825e8220b2e
Author: Brian Paul <brianp@vmware.com>
Date: Fri Apr 2 16:15:16 2010 -0600
llvmpipe: assorted texture and tile/line conversion code change s
The tiled/linear conversion functions take x/y positions now to
allow converting only sub-regions.
More texture-related helper functions.
commit baad81ec5318d44bfac1e37c7643afc0836607bb
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 30 13:18:40 2010 -0600
llvmpipe: convert tiled->linear upon PIPE_FLUSH_SWAPBUFFERS
If we know we're about to do a swapbuffers we should immediately
convert the tiled color tiles to linear instead of later in
llvmpipe_texture_unmap() since we can take advantage of threading/
parallelism here.
commit 928dd41256811daeddb7506a49a34dbad04beaf8
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 30 09:16:58 2010 -0600
llvmpipe: polish-up the llvmpipe_flush() code
commit dd6014abcf86c517d159b8175e0eaeb167ea2ef6
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 30 09:15:17 2010 -0600
llvmpipe: SETUP_x enum clean-up
commit 0b1ce6da2b28a41f3389685ab93e10b43c950f5d
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 26 10:43:37 2010 -0600
llvmpipe: remove unused vars
commit 4562663480f88162ed4452cb05569eecb67f9f39
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 26 10:31:55 2010 -0600
llvmpipe: cope with non-existant color/depth buffers
The fragment jit functions always grab these pointers, even if they're
not used.
commit df4329edbaf204ed501f1eac0698b8198178f9af
Author: Brian Paul <brianp@vmware.com>
Date: Thu Mar 25 15:20:15 2010 -0600
llvmpipe: do all render target surface mapping/unmapping in the rast code
commit 3d0c25d5ba8b8f61e8366d4c97324e45d526ff41
Author: Brian Paul <brianp@vmware.com>
Date: Thu Mar 25 14:31:21 2010 -0600
llvmpipe: map z/stencil buffer on demand like color buffers
Plus lots of code clean-up and loose ends taken care of.
commit c3b6fddd788aef09b4b84b843b7b1272231151e8
Author: Brian Paul <brianp@vmware.com>
Date: Thu Mar 25 13:15:03 2010 -0600
llvmpipe: remove unused write_zstencil field
commit 63374d97836926a6357e9d6dd24a509a8e155c56
Author: Brian Paul <brianp@vmware.com>
Date: Thu Mar 25 09:45:59 2010 -0600
llvmpipe: add missing lp_rast_end() call
Fixes crash on window resize when LP_NUM_THREADS=0.
commit 92fe9952161cc06f6edc58778e9e5a8b9ea447dc
Author: Brian Paul <brianp@vmware.com>
Date: Wed Mar 24 10:15:19 2010 -0600
llvmpipe: add tiled/linear conversion for 16-bit Z images
commit 6605fa28c147f30df351da0e4413cab33e4db5da
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 23 16:06:41 2010 -0600
llvmpipe: implement tiled/linear conversion for Z/stencil images
commit 804528d84ffa292ef9d49d3666cdd3fa099ff3ff
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 23 16:05:45 2010 -0600
llvmpipe: added texture stride comment
commit 66a88c012edf670c4ac887a912f02dcff93266dd
Author: Brian Paul <brianp@vmware.com>
Date: Tue Mar 23 16:04:07 2010 -0600
llvmpipe: remove unused vars
commit e2ca8d1328316dc8b36d5f688c16d109e49a6870
Author: Brian Paul <brianp@vmware.com>
Date: Mon Mar 22 18:53:11 2010 -0600
llvmpipe: checkpoint WIP: overhaul texture/surface mapping
Conversion between tiled and linear surfaces is working everywhere now.
The LP_TEXTURE_READ/READ_WRITE/WRITE_ALL flags let us avoid unnecessary
image layout conversions.
Still some loose ends, temporary/debug code, etc.
Need to implement tiled/linear conversion for depth/stencil images.
commit f2730a03839ee8984c1f537b7cbebba24961397a
Author: Brian Paul <brianp@vmware.com>
Date: Mon Mar 22 14:41:58 2010 -0600
llvmpipe: rename/repurpose lp_rast_store_color()
commit e192a47552c5d20d2caef452ca7697e2cd852c9b
Author: Brian Paul <brianp@vmware.com>
Date: Mon Mar 22 14:38:51 2010 -0600
llvmpipe: remove lp_rast_load_color()
commit 3cff0bde4b4ab980e1c3e812700419091527c76b
Author: Brian Paul <brianp@vmware.com>
Date: Mon Mar 22 14:11:38 2010 -0600
llvmpipe: remove/consolidate texture image code
commit 3a2f08b6a550c69ef5e874f482be30252cbf8bfa
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 19 17:03:14 2010 -0600
llvmpipe: checkpoint WIP: directly render to tiled texture buffers
We're now directly writing colors into the tiled texture image buffers.
This is a checkpoint commit with lots of dead code and temporary hacks.
Everything will get cleaned up eventually.
commit c5ca987e03870849514d4e3c99af143722a09695
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 19 16:41:14 2010 -0600
llvmpipe: refactor code, create tile_pixel_offset()
commit 2133e8273e937cbac09cd7264d6ce53af9764ddb
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 19 14:55:11 2010 -0600
llvmpipe: pass LP_TEXTURE_LINEAR/TILED flags around
commit b9b9d4b82b01f4588721fdc8444740f859b4a021
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 19 14:51:05 2010 -0600
llvmpipe: checkpoint WIP: hanlde co-existing tiled/linear texture data
Cube maps are temporarily broken, maybe other things.
commit 4cd322e6889940b5f155fcb69041b685b9ef9273
Author: Brian Paul <brianp@vmware.com>
Date: Fri Mar 19 11:34:43 2010 -0600
progs/demos: add other modes/patterns to dissolve demo
swzsurf doesn't support GART
Thanks to Marcin Kościelnicki <koriakin@0x04.net> for spotting that !
This fixes corruption in etracer and the following related errors :
[14381.551927] [drm] nouveau 0000:01:00.0: PGRAPH_ERROR - nSource:
PROTECTION_ERROR, nStatus: INVALID_STATE
[14381.551945] [drm] nouveau 0000:01:00.0: PGRAPH_ERROR - Ch 2/2 Class
0x039e Mthd 0x0184 Data 0x00001cd9:0x00001cd9
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Adapted by Luca Barbieri for mesa master.
Faster, simpler and more flexible.
Also, we set those flags properly on nv3x so that we don't allocate buffers in GART.
Since on AGP GART is uncached, OpenGL doesn't distinguish between vertex and index buffers, and we don't support hardware index buffers for now, this caused uncached reads.
Also check bind and not usage for PIPE_BIND_* flags, got broken in the gallium-resources transition.
texdepth stopped working when npot went in, this brings it back
to life.
< MostAwesomeDude> That looks like what I was going to do.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is relatively simple at the moment, recognizing only constant
values, and not (for example) values that are restricted to a range
that make the branching constant. However, it does remove 59 lines
from the printout of CorrectParse2.vert.
Currently we used a single buffer for each fragment programs, leading to
rendering synchronization. This patch uses a doubly linked list of BOs,
which is dynamically resized if all the BOs are busy.
Note that inline image transfers could be an alternative option: this
will be explored later.
This removes one of the big performance limitations of the current
driver.
We also stop using pipe_resource internally in favor of using nouveau_bo
directly.
Add vg_context_update_draw_buffer (and helpers) that duplicates the
logic of st_resize_framebuffer. Use the new function instead of
st_resize_framebuffer in vg_manager.c.
The fixed function vertex program shouldn't need to deal or touch tex coords
if stuffing is enabled.
Though I'm not 100% this won't break assumption made elsewhere it seems like
the correct thing to do, and makes r300g point sprites a lot easier to implement.
draw: fix point-sprite when vertex program is used.
This commit regressed draw, so fix it as well to help bisection.
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied -
Convert sprite coord index to a per-coord enable bit
set the rasteriser block up correctly for point sprites.
The inputs to the RS hw block change for sprite coords, so fix them up
properly - this fixes piglit point-sprite test.
]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Two reasons:
- progs will eventually have its own repository
- it is just to easy to forget updating the
code for interface changes when it is outside of src
Currently on nv30/nv40 an assert will be triggered once 32 queries are
outstanding.
This violates the OpenGL/Gallium interface, which requires support for
an unlimited number of fences.
This patch fixes the problem by putting queries in a linked list and
waiting on the oldest one if allocation fails.
nVidia seems to use a similar strategy, but with 1024 instead of 32 fences.
The next patch will improve this.
This patch adds support for two-sided vertex color to nv30/nv40.
When set, the COLOR0/1 fs inputs on back faces will be wired to vs outputs BCOLOR0/1.
This makes OpenGL two sided lighting work, which can be tested with progs/demos/projtex.
This is implemented in nvfx_state_fb and fragtex but was missing
in nvfx_screen.
This allows to avoid glCopyTexSubImage CPU fallbacks and makes Doom 3
much faster as a result.
This was proposed by Marek Olšák and no one objected, so just
pushing it.
The extension is currently not exposed, because the mechanism to
discover if the driver actually supports this is missing.
We probably should change is_format_supported to handle this too.
This will allow to test Gallium drivers anyway in the meantime.
Based on work by Dave Airlie.
Changes by me:
1. Fix assertion in st
2. Change to use unpadded Gallium formats
Autobinding creates additional pushbuffer usage which may not be
accounted in callers, and is also slow.
The next relocations patch depends on this for correctness.
Assert instead if the objects are not bound, which should happen at
screen creation time.
RING_3D creates a method start for subchannel 7.
Bind the 3D engine to a fixed subchannel to make it work
This is much faster than the old BEGIN_RING, since we don't need
to waste cycles trying to "autobind" stuff, when a fast static binding
is perfectly good.
Subchannel 7 is chosen because the kernel takes up the lowest ones.
Currently we miscalculate the space needed to push vertices, causing
flushes where they should not happen.
Use a much more conservative estimate to fix it.
It will be done better in the future (e.g. using the nv50 primitive
splitter).
CFLAGS needs to be passed, as you already know.
Commit 3e17a5b047 broke this by adding a new link
command without CFLAGS.
Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Otherwise, we read from VRAM...
Yes, again, it should be fixed to tell whether the buffer is in
VRAM or not and behave appropriately.
But this should be in pipebuffer/a generic layer; revisit this later
too.
Currently we are relocating transfers to VRAM to use the blitter,
which is terrible.
Maybe for ->VRAM the blitter could be better, but we can't be
perfectly sure of that due to relocations.
In other words, just do the simple thing, and defer fine-tuning the
transfer hardware method to a later stage, while making it work
decently now.
The new address should go to TIC entries 1, 2 instead of entry 0.
Also, using PIPE_SHADER_* for the program type was wrong, they're
ordered like the tesla method now, sorry for the confusion.
These are our reference software rasterizers. They can build everywhere
and are a precious debugging tool.
Making them always present immensily simplifies the scons logic.
If people want to avoid building it is still possible to pass
direcotries and target names to scons to narrow the build.
I've been back and forth on this, but I believe it's worth to have debug
by default.
Most humans (developers, testers) will want to use the debug version by
default. Many build bots want release but they are bots, and humans >
bots, so I don't care that much.
This is part of my initiative of minimizing the scons option mess many
complain about.
Now that draw depends on llvm it is very difficult to correctly handle
broken llvm installations. Either the user requests LLVM and it needs to
supply a working installation. Or it doesn't, and it gets no LLVM
accelerate pipe drivers.
* Turn some assertions to error messages.
* At most 16 vertex elements can be set, others are ignored.
* Rasterize at most 8 vertex-shader generic outputs, others are ignored.
This includes fog and WPOS.
* Unknown shader semantic names are ignored.
The evolution of TX_FORMAT bits is as follows:
* When a texture is created, set bits independent of pipe_format.
* When a sampler view is created, add format-specific bits.
* When sampler states and views are getting merged, add min/max LOD.
This is most likely a bug in the mesa state tracker, but do the quick hack
for now to avoid the divide by 0.
reported and hack generated by almos on #radeon
Signed-off-by: Dave Airlie <airlied@redhat.com>
A native display has no interest in depth/stencil format. Remove it
from the interface and let the common code derive the supported
depth/stencil formats from the pipe screen.
commit 0189cb2fde9f5d7326fd4bfbc2e52db4cce73b3e
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Apr 10 12:48:43 2010 +0100
gallium: don't use generic get_transfer func for textures
It doesn't know and can't fill in the stride value.
commit 65bc6f88fd9ce8ff90175b250e580bef2739ea35
Author: Chia-I Wu <olv@lunarg.com>
Date: Sat Apr 10 13:49:34 2010 +0800
i915g: Initialize screen surface function.
commit eb56e64986790aa2fa35534ce652b78656b0c3c5
Merge: f8b0a7f e7f1e5c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Apr 10 00:38:43 2010 +0100
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/r300/r300_texture.c
commit f8b0a7f6a3a98fd36ce90a81073ec8c8f09b684c
Merge: a3c9980 f43c679
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Apr 10 00:35:09 2010 +0100
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/r300/r300_texture.c
commit a3c99807de37dc2c072f1d75ed3a11da333bc9a1
Author: unknown <michal@.(none)>
Date: Fri Apr 9 18:51:39 2010 +0200
scons: Add missing sources.
commit 927cec79cedb457efa9e6f335727cfcb8e4908e2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 18:07:56 2010 +0200
gallium: fix another compile warning after merge. Hmpf.
commit 52953cd7b0e51deafecb812bdc40f9e45f9ac62a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 18:02:11 2010 +0200
gallium: fix comment
commit 7c8763aa6cfc74adf1ea49c2bab25ca17b32575f
Author: unknown <michal@.(none)>
Date: Fri Apr 9 18:05:20 2010 +0200
util: Fix type cast.
commit 9d0086411a104b7cc9297aac0d1f82853118d7bf
Author: unknown <michal@.(none)>
Date: Fri Apr 9 18:04:33 2010 +0200
libgl-gdi: Use proper unwrap functions for resources.
commit 251a5cdd18ba31c690ef61f133dfc65cd4a45cf8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 17:51:23 2010 +0200
gallium: more comments fixup
commit 8f3f9d5e1e9c0de98a3dfb19e81250d2c32ee4e9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 17:48:18 2010 +0200
gallium: another fix after merge
commit 41f00a32ee5be91512c048bacb89ede0e04bc08d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 17:44:30 2010 +0200
gallium: more pipe_texture/resource fixes after merge
commit faf53328d1154c51d8a59513f2bfcae62272b0bf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 17:44:24 2010 +0200
gallium: fix comments for changed USAGE flags
commit fdcb17bea4b0798d316b56deea69832f41142adf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Apr 9 16:40:07 2010 +0200
gallium/pb: pb uses PB_USAGE_ flags, not PIPE_TRANSFER_ (same value anyway)
commit c95f7278ecc6db417ec1053279f2a8172c47aee9
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:44:35 2010 +0100
llvmpipe: fix merge glitches
commit 28f8b8683175149a381be5eff263d4c20568bce7
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:41:39 2010 +0100
r300g: update after merge for pipe_resources
commit 248c93cbc066ba6e3fadd94c5fcf3bdbb373d8fd
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:41:20 2010 +0100
st/mesa: fix old pipe_texture usages
commit a563b1c5c2cb57b3ef28a3654d9b477460d13ced
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:40:56 2010 +0100
r300g: remove unused variable
commit 734500131d828c9dfd68c5fa26b3e6b07e086d2d
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:40:36 2010 +0100
nv50: fix compiler warning
commit efd402e13037e5c3e29759fa5b1c754c6d65d0e2
Merge: fec8a1d 5452615
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Apr 9 13:33:57 2010 +0100
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/cell/ppu/cell_screen.c
src/gallium/drivers/cell/ppu/cell_texture.c
src/gallium/drivers/llvmpipe/lp_screen.c
src/gallium/drivers/r300/r300_context.c
src/gallium/drivers/r300/r300_render.c
src/gallium/drivers/r300/r300_screen.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/state_trackers/egl/common/egl_g3d.h
src/gallium/state_trackers/egl/kms/native_kms.c
src/gallium/state_trackers/egl/x11/native_dri2.c
src/gallium/state_trackers/egl/x11/native_ximage.c
commit fec8a1db13fac04ef56f6ece799d1f20aa3011db
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Apr 3 07:58:34 2010 +0200
util: fix assertion failures in pipe_buffer_flush_mapped_range
commit 1ff3984c2edce9927744f3cce3e7b07778990170
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 8 17:44:54 2010 +0200
docs: fix transfer_map description
commit 20bf14be8ac6438cb1afa38212e306fc06a5ed40
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Apr 8 14:39:13 2010 +0100
util: fix up several uses of pipe_map_buffer_range
This function used to return a pointer to where the start of the
actual buffer would have been, even though only the requested range is
being mapped.
In the resources change, the function was modified to use a transfer
internally, and started returning the pointer to the beginning of the
transfer, ie the mapped range.
Some users of the function were changed to reflect this new behaviour,
some were not. Since then the function has reverted to its original
behaviour, matching master.
This change restores some of the users of the map_buffer_range helper
to expect the old/original behaviour.
commit 33179a86058b68b518f40971030db337dc26fe6e
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Apr 8 14:38:54 2010 +0100
mesa/st: fix up several uses of pipe_map_buffer_range
This function used to return a pointer to where the start of the
actual buffer would have been, even though only the requested range is
being mapped.
In the resources change, the function was modified to use a transfer
internally, and started returning the pointer to the beginning of the
transfer, ie the mapped range.
Some users of the function were changed to reflect this new behaviour,
some were not. Since then the function has reverted to its original
behaviour, matching master.
This change restores some of the users of the map_buffer_range helper
to expect the old/original behaviour.
commit 3f5363d4dc9d7ad48467ae82d58d5f3d9bd10698
Author: Keith Whitwell <keithw@vmware.com>
Date: Wed Apr 7 17:26:52 2010 +0100
util: map_range and flush_range have offsets relative to start of buffer
commit 7eb1bfb97a790c73188d6b616d54fb3849e69b1e
Author: Keith Whitwell <keithw@vmware.com>
Date: Wed Apr 7 17:26:08 2010 +0100
nv50: fix compiler warning
commit d040daff0642dd791ac38e9b353dc251b03fc873
Author: Keith Whitwell <keithw@vmware.com>
Date: Wed Apr 7 17:25:58 2010 +0100
nvfx: fix compiler warning
commit 49ec01dffb8e99ab3ff8f856287db7b4df3efed6
Author: Chia-I Wu <olv@lunarg.com>
Date: Mon Apr 5 11:58:53 2010 +0800
mesa/es: Fixes for gallium-resources.
commit 47c87ada452be45766928a01b6d69da63e3a5f5e
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Apr 3 05:19:20 2010 +0200
r300g: fix transfers for textures created from winsys handles
commit 5f2701fddaef9c18d85c049311c2819c49cc1ae0
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Sat Apr 3 03:52:38 2010 +0200
nouveau: don't use the staging usage
Maybe it could make sense, but for now dynamic is enough.
None of these avoid uncached reads from GART on AGP cards.
commit 0db20fa49e008f35911007fa7ed9be1d678a2161
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Sat Apr 3 03:27:19 2010 +0200
i965: add brw_resource.c to Makefile
commit b94f3e7389cbd1b6465de3c04e8059ce73f1ea1f
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Sat Apr 3 01:48:33 2010 +0200
nouveau: fix for gallium-resources
commit a01ff99a19986e6beb7903431e60a074945b09bc
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 19:26:35 2010 +0200
gallium: fix missing includes
commit 26aeded562ce947a6deeb867fe22bf8daf7b1a1a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 19:19:18 2010 +0200
gallium: remove video interface and related stuff
These interfaces weren't quite was needed, and building disabled for a while.
Some code actually build since some branch merge, and were now not fully
converted to gallium-resources.
See http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg09619.html
for a discussion of this. Video related work is done in origin/pipe-video
branch.
commit c64285aea45997a276fb141d7badc8a04f617c7c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 18:45:54 2010 +0200
python: fixes for resource changes
doesn't look quite ok yet, but sort of compiles.
commit 03d4d5a41f5cf158a358fd705c695e1c987a328f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 18:34:46 2010 +0200
gallium: s/u_box_orgin_2d/u_box_origin_2d
commit 2444f023142bcaf7bd310b44794580f273254408
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Apr 1 03:26:50 2010 +0200
r300g: fix segfault when the transfers functions are used
Still broken.
commit 6f09bf4066ab651b323c131bb07978e700519805
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 00:05:12 2010 +0200
r300g: compile fixes
commit 76711ff40d2092f9ef03d452de7458c4e76d9246
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Apr 1 00:04:47 2010 +0200
nvfx: more compile fixes
commit c5d2e90c9cc119447a447dc04a4bce4ab91fc671
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 31 23:18:50 2010 +0200
gallium: more mostly merge fallout fixes...
commit fbc3722696790857f4adc936190406e74dffd969
Merge: 86d9225 d97f696
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 31 22:09:35 2010 +0200
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/cell/ppu/cell_screen.c
src/gallium/drivers/i915/i915_buffer.c
src/gallium/drivers/i915/i915_context.h
src/gallium/drivers/i915/i915_resource_texture.c
src/gallium/drivers/i915/i915_screen.c
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/drivers/i965/brw_resource_texture.c
src/gallium/drivers/llvmpipe/lp_screen.c
src/gallium/drivers/llvmpipe/lp_setup.c
src/gallium/drivers/nvfx/nv30_fragtex.c
src/gallium/drivers/nvfx/nv40_fragtex.c
src/gallium/drivers/nvfx/nvfx_miptree.c
src/gallium/drivers/nvfx/nvfx_screen.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/svga/svga_screen_texture.c
src/gallium/state_trackers/dri/common/dri_drawable.c
src/gallium/state_trackers/dri/common/dri_screen.c
src/gallium/state_trackers/dri/common/dri_st_api.h
src/gallium/state_trackers/dri/drm/dri1.c
src/gallium/state_trackers/dri/drm/dri1.h
src/gallium/state_trackers/dri/drm/dri2.c
src/gallium/state_trackers/python/st_device.c
src/gallium/state_trackers/python/st_sample.c
src/mesa/state_tracker/st_cb_clear.c
src/mesa/state_tracker/st_cb_drawpixels.c
src/mesa/state_tracker/st_cb_readpixels.c
src/mesa/state_tracker/st_cb_texture.c
src/mesa/state_tracker/st_extensions.c
commit 86d9225d19d194eebbbe95b059695697c3307d15
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 31 19:06:06 2010 +0200
gallium: more fixes for bind changes
commit a215ef0606347e34669a580ec8df93ede7e46399
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 31 18:48:36 2010 +0200
gallium/docs: some updates for bind changes
commit c6c7e6746cbc7af59f7972719ed76f43e8ac16fc
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 30 20:24:26 2010 +0200
gallium: more bind change compile fixes
commit a83fa1504b78180524a5eb454ae186741a27cdf8
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 30 17:37:13 2010 +0200
compile fixes
commit 30dc8afcd243d6a160571bac5f06d773e54a4196
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 30 16:56:28 2010 +0200
fix some merge issues
commit 30aa617fee11fe50c0a9c2f33fcd120a474f5e34
Merge: 1dde609 3a830bc
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 30 16:09:45 2010 +0200
Merge commit 'origin/gallium-buffer-usage-cleanup' into gallium-resources
Conflicts:
src/gallium/drivers/nouveau/nouveau_screen.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/winsys/drm/radeon/core/radeon_drm_buffer.c
commit 1dde609ad6c9d2dfa0a5f7167f3c5bcf023b7c4d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Mar 24 02:35:00 2010 +0100
docs: some updates for pipe_resource
commit f236f9660d31b936f54b64ae07e569f8637067bd
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Mar 24 01:31:28 2010 +0100
nvfx: fix for gallium-resources
It seems to work with basic applications but almost surely needs more work.
In particular, it probably shouldn't use PIPE_BUFFER_USAGE_* flags
and should use PIPE_TRANSFER_* in several places.
Also, we probably don't want the vtable indirect calls and that ought
to be replaced with something better instead.
commit 5a136ad7b63768cb9a753eff8686c44592e62325
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Mar 24 01:31:19 2010 +0100
nv50: fix build in gallium-resources
Not actually tested.
Also needs next patch tee to actually build, this is just the nv50 part
split from the rest.
commit 3a830bc4a3f0f60c925b9434845a6bcad9a913c5
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 23 14:00:52 2010 -0700
st/egl: fix up for binding flags
commit c6a80dc32ef17bc972d4137ce7444ebed4d28ebb
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 23 13:52:15 2010 -0700
r300: restore 4k alignment for oqbo buffers
commit e75a8d5ea9e0ffcf67bc858e08937e10b4fc74ba
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 23 13:00:07 2010 -0700
gallium: bind flags
commit 1f5b509543a7f399835fd9edf27c18e1643fab7d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 19:32:21 2010 +0100
i965g: scons compile fixes
commit 2c385f8f905ec794d9119c05c6293e0b1b9b565a
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 19:20:33 2010 +0100
nouveau: drm compile fix
commit b285086ebd5132b47c340897c4622cc9fbd286cb
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 18:36:19 2010 +0100
r300g: pipe_resource compile fixes
bring back mistakenly deleted radeon_buffer.h
plus some more
commit 7810606f423ef2f51f0a14b919640c2fd2c931aa
Author: Michal Krol <michal@vmware.com>
Date: Tue Mar 23 16:21:03 2010 +0100
softpipe: Map GS constants, too.
commit 366f1176fb89d2b1978da6cfe60000b76bbc7338
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 15:51:52 2010 +0100
failover: update for pipe_resources
commit 615f44d70d293704ed821bc0b21fcfe6e363895d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 15:51:02 2010 +0100
identity: remove double is_resource_reference assignment
commit 7008586020395905ddfff333d02b3893de369796
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Mar 23 15:50:32 2010 +0100
trace: compile fix
commit 058c5697bda4c9cf7b49d26ee27a34586544efaa
Merge: dd7ba13 b33fd3c
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 23 06:40:39 2010 -0700
Merge commit 'origin/gallium-resources' into gallium-buffer-usage-cleanup
Conflicts:
src/gallium/state_trackers/vega/api_filters.c
src/mesa/state_tracker/st_cb_drawpixels.c
commit b33fd3ce3daf2921a895367d0ed3fd9c718a8575
Author: Michal Krol <michal@vmware.com>
Date: Mon Mar 22 21:03:26 2010 +0100
gallium: Usage parameter of get_transfer/transfer_inline_write is a bitfield.
commit 9c1162d9d656062a490a529997def3f674cc61fc
Author: Michal Krol <michal@vmware.com>
Date: Mon Mar 22 20:50:49 2010 +0100
scons: Update file lists after gallium-resources changes.
commit af9793ab9e5386b150d6b25c0d1978fdc67172e4
Author: Michal Krol <michal@vmware.com>
Date: Mon Mar 22 20:04:39 2010 +0100
gallium: Do not use `template` for formal parameter names.
commit dc2e12d714c444af9ff1acdd5a7e91408b116c99
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:41:34 2010 +0000
ws/nouveau: remove pipe_texture reference
commit b94c72329f1be85887d40d49b0586979da469d77
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:40:41 2010 +0000
ws/xlib: remove pipe_buffer reference in comment
commit 0a2af3eeae7de1d1cb433f0a2c35136b115f9920
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:39:34 2010 +0000
st/vega: clean up reference to pipe_texture
commit 437ce98daae46be5d532fbb04c7cbf4a503c1623
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:39:02 2010 +0000
st/python: begin conversion to pipe_resources, much more to do
commit 1b02e1ee3e5e87774f0c9e5f0e1898b7f8de1b16
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:29:34 2010 +0000
st/xorg: update for pipe_resources
commit eb39977fe7a1d9f0c3f4f2d4303a93c2c613cc3b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:23:51 2010 +0000
st/dri: update for pipe_resources
commit e447aeff597a4d8c0f5de25854c14c99f2cc138c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:23:36 2010 +0000
st/egl: update for pipe_resources
commit e4cc48da8fdbd7d521257a6d7cd10e6fc5aa1a65
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:08:44 2010 +0000
r300: drop use of R300 DONT SYNC flag
commit 129a83ab4d32e44ded5faea3f86ae5e1e62cddb6
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:08:17 2010 +0000
pipebuffer: use transfer flag
commit 575b35ee6b683d77095ef21c573c1de207107e79
Merge: f29ac73 9fc6c8b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 22:03:25 2010 +0000
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/llvmpipe/lp_texture.c
src/gallium/drivers/r300/r300_context.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/winsys/drm/radeon/core/radeon_buffer.h
commit f29ac73f3f626d5779a627b7fa6fecdb60a35aab
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 18:37:25 2010 +0000
cell: attempt to convert to pipe_resources
Can't even compile test this driver.
commit 484b1947f4af81bab60b41f21c3c23ea6f67488c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 17:25:50 2010 +0000
nvfx: restore usage of pipe_winsys
The interface that cannot be killed...
commit ac76ac6eb30f4f9aa9f5733d60358b357925953a
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 17:25:10 2010 +0000
nv50: fix warning
commit 9683f4423449fa5acf6c019c571223650473bd82
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 17:14:31 2010 +0000
util: restore u_simple_screen, nouveau still relies on it
commit 961cbcb62232689c959965384c6aa9b8eca697c1
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 16:51:54 2010 +0000
nouveau: convert nvfx and nv50 to pipe_resources
Compile tested only.
This was a deeper change than I was hoping for, due to the
layering of the pipe_texture implementation in each driver on
top of a shared pipe_buffer implementation in the shared code.
Have modified the shared code to act as a set of convenience
routines operating on nouveau_bo objects.
Each driver now uses the u_resource_vtbl technique to split the
implementation of pipe_resources between the existing miptree code
for textures and a new, minimal buffer implementation in each
driver.
Eventually these should be combined, not least because APIs are now
allowing things like binding buffer resources as textures and render
targets.
commit 18ba74016db13b23282f5033ee37b628a12ee566
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 10:02:54 2010 +0000
r300: fix compilation after merge
Also build r300 by default.
commit eb9c0175c8e4baca3fcb0b8364f83ceba9d74e0d
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 09:59:49 2010 +0000
st/vega: fix up after merge
commit ea8dd1d4ae7b58c9315c3491046ef3852ddd3377
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 09:59:44 2010 +0000
aux: remove unused piperesource helpers
commit be7af29d3ad1a10409b0ea689d882cf30a4e1d62
Merge: d22c2c6 12deb9e
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 21 09:54:53 2010 +0000
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/auxiliary/cso_cache/cso_context.c
src/gallium/auxiliary/cso_cache/cso_context.h
src/gallium/drivers/r300/r300_context.c
src/gallium/drivers/r300/r300_render.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/state_trackers/vega/api_filters.c
src/gallium/state_trackers/vega/image.c
src/gallium/state_trackers/vega/image.h
src/gallium/state_trackers/vega/mask.c
src/gallium/state_trackers/vega/mask.h
src/gallium/state_trackers/vega/paint.c
src/gallium/state_trackers/vega/paint.h
src/gallium/state_trackers/vega/renderer.c
src/gallium/state_trackers/vega/renderer.h
src/gallium/state_trackers/vega/shader.c
src/gallium/state_trackers/vega/vg_context.h
src/gallium/state_trackers/vega/vg_tracker.c
src/mesa/state_tracker/st_manager.c
commit d22c2c6cb23a063e3334a165d0c5c3d73f05d234
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 20 11:48:54 2010 +0000
drm/r300: update for r300g pipe_resources conversion
Remove old files that related to pipe_buffers but weren't being
built. Hopefully this is correct.
commit f07b2c836958bee5796899123eca4ed05ac6242b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 20 11:47:03 2010 +0000
r300: convert to pipe_resources
Do a very shallow conversion - basically keeping the existing
buffer and texture code intact and using a vtbl struct
inside our resource struct to select between the two implementations.
The buffer and texture treatments could be further merged without
much effort, but try to keep the existing code working at this point.
commit feca9c3ca62daaf0d8745370106d4e3b22340c49
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 06:00:34 2010 +0000
gallium: update new merges to pipe_resource
commit 1cad983eac77a0c5333e6a3ce92b90ac87407714
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 06:00:19 2010 +0000
drm/sw: update new merges to pipe_resource
commit 191d39490ed792c569f98d42cf05891b264f71f8
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 06:00:01 2010 +0000
vg: update new merges to pipe_resource
commit b727c59bc44812ad503d9390505c92b738a5b8b0
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 05:59:38 2010 +0000
llvmpipe: update new merges to pipe_resource
commit 5f4b64b37fdcd70162c382b2ebbd494bef751dbd
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 05:59:23 2010 +0000
brw: pipe_resource fixes
commit d4aca209f531f1b65bf706ce1e5fc0375b587eb6
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 05:59:06 2010 +0000
util: update new merges to pipe_resource
commit cf6bef0afee10763c78509a3d17e9a6e49bcd3c8
Merge: 1997231 6de8e56
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 18 05:38:50 2010 +0000
Merge commit 'origin/master' into gallium-resources
commit 1997231916144485c3c4a36f53eda39fce460272
Merge: ad88ac7 e1ee3ea
Author: Keith Whitwell <keithw@vmware.com>
Date: Wed Mar 17 08:46:38 2010 +0000
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/auxiliary/Makefile
src/gallium/auxiliary/util/u_blit.c
src/gallium/auxiliary/util/u_blit.h
src/gallium/auxiliary/util/u_gen_mipmap.c
src/gallium/auxiliary/util/u_gen_mipmap.h
src/mesa/state_tracker/st_cb_texture.c
src/mesa/state_tracker/st_gen_mipmap.c
commit ad88ac79034a91670940276e722bdd398d5c9023
Merge: 77bc770 8cdfd12
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 16 09:13:07 2010 +0000
Merge branch 'gallium-sampler-view' into gallium-resources
Conflicts:
src/gallium/auxiliary/cso_cache/cso_context.c
src/gallium/auxiliary/util/u_blit.c
src/gallium/drivers/llvmpipe/lp_texture.c
src/gallium/drivers/softpipe/sp_texture.c
src/mesa/state_tracker/st_cb_fbo.c
src/mesa/state_tracker/st_framebuffer.c
src/mesa/state_tracker/st_texture.c
commit 77bc770c991ea025c82eaa4e0e2390efd825d96d
Author: Keith Whitwell <keithw@vmware.com>
Date: Mon Mar 15 22:21:48 2010 +0000
util: missing file
commit f83c91db8ae63a3c3a34ff21492427a5663fb760
Merge: c1d4774 42910eb
Author: Keith Whitwell <keithw@vmware.com>
Date: Mon Mar 15 09:48:58 2010 +0000
Merge commit 'origin/gallium-sampler-view' into gallium-resources
Conflicts:
src/gallium/drivers/nv40/nv40_transfer.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/drivers/trace/tr_drm.c
commit dd7ba1378fc50710667724d30d6d4cf1125ad61e
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 23:54:36 2010 +0000
gallium: start a cleanup of buffer_usage
Remove fairly meaningless CPU/GPU READ/WRITE flags and
replace with proper usages.
commit c1d4774187189f4af8ff421b210824f3d53ceefb
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 23:05:45 2010 +0000
llvmpipe: don't FREE userbuffer data
commit 9bfa07afe179f8060e7beefb754a29c4d9c6e349
Merge: 65757a1 08cddfe
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 22:54:51 2010 +0000
Merge commit 'origin/master' into gallium-resources
Conflicts:
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_scene.c
src/gallium/drivers/llvmpipe/lp_texture.c
src/gallium/drivers/llvmpipe/lp_texture.h
src/gallium/drivers/softpipe/sp_texture.c
src/gallium/drivers/svga/svga_screen_texture.c
commit 65757a143f8e3fcd7afbc1ff92db44a823edf46c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 22:41:17 2010 +0000
svga: build fixes
commit 2f5435220501d4b3050cab2bb1dce6174cd13ff6
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 22:39:25 2010 +0000
gallivm: build fix
commit 42642ec0984107d82b740711f2debbf38457a06e
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 22:38:33 2010 +0000
llvmpipe: convert to pipe_resources
commit 7bbcb21e20cb545ef8dd5fc61d67ed931c69e813
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 22:19:30 2010 +0000
gallivm: convert to pipe_resources
commit 88ae0d04610ca52649b42e32141a52af6d5a739b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 21:01:22 2010 +0000
configs: build svga
commit 0e112bc69828e65085ebfaef895ecd78fe53f1c4
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 21:01:17 2010 +0000
gallium: restore PIPE_BUFFER_USAGE_CUSTOM
commit 102aca688b95c976b7178b84092fba7d041ff9d2
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 21:00:41 2010 +0000
util: more transfer helpers
commit a79f6a4a0836fc64c07f9aeec21d914474fe3649
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 20:59:36 2010 +0000
svga: convert to use pipe_resrource
As with others so far, a fairly shallow conversion.
commit 087fb54492fa5e3baf040c5efbf7dacd98a8849b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 18:38:08 2010 +0000
brw: fix function name
commit cfc9dd707d16e06fd23b6926da3a6e2269f31dc8
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 18:19:06 2010 +0000
gallium: enable brw compile
commit 8a5b86d76bdd3c7de63322423f59940a4dc2ee25
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 18:18:50 2010 +0000
brw: compiles with pipe_resource
commit 563ca458b548c41ca4dca559354c16ca1a80d009
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 18:18:42 2010 +0000
i915: hook up userbuffer create
commit b5095b48247b6020e36cc942ac145c3fccbe9a19
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 17:20:51 2010 +0000
i915: use helpers for is_resource_referenced
commit d5392bdc6d70002acf9c5bac0fde14ba405c4d84
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 17:20:38 2010 +0000
util: helpers for is_resource_referenced
commit 2f3492a5aefbb2e745f6700d8e910ebb5cbb98cf
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 17:08:50 2010 +0000
i915: remove buffer.c again
commit 1373a35b65fcc25ec6cdfea2703bbb3417de2c6d
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 17:08:34 2010 +0000
i915: add new files to scons
commit 0251612d70e57fe38e10e75915b394631d224f2c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 16:38:29 2010 +0000
i915: compiling with pipe_resources
commit 9a0235864252929a8eedd44dbd2fe30fe54c531d
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 13:51:16 2010 +0000
gallium: remove inline_read transfer
commit a6ba315e25793e0c228d3a4ae2f8201634dc9ff0
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 13:50:32 2010 +0000
trace: get running
Some dumping will be incorrect or disabled, but it runs without
crashing
commit 2133f1d90aa919662a8420a0cf3b4557e6ec1afd
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 13:49:42 2010 +0000
gallium: remove the inline_read transfer
There aren't enough users of this to justify it.
commit bccaf1fa30881f6b4fb189a9b74fc7af79c3b481
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 12:30:37 2010 +0000
identity: hook up inline transfer operations
commit e4c152a344f2f53c842b810724a2ae7cb4554f58
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 12:21:54 2010 +0000
gallium: build trace and identity
commit 0b5a311db78852fa9fd021e17b5968a1e0436b49
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 12:21:36 2010 +0000
gallium: add more of the transfer state to pipe_transfer
Not really sure if recording all the arguments to the
create_{transfer,texture,surface,etc} functions in the result of those
calls is a great idea, but it seems we're fairly dependent on it
throughout the code.
commit a23985c26eafe76b0a7dacc892e50cb589f211fe
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 12:19:46 2010 +0000
identity: compiles with pipe_resources
commit d0d630944304c208f6dade6ef8836763ee2bc7b4
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 12:13:02 2010 +0000
trace: compiles with pipe_resources
commit a4451ea459cc8bfc915fe6aed2891b90854b6c9d
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:39:50 2010 +0000
softpipe: give userbuffers a format other than NONE
Most mesa demos working
commit 32bb1bd4ba29884a4ecfa11c8441d33dfceabcef
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:39:21 2010 +0000
util: correct argument order in pipe_buffer_map
commit 7e2696c06445282feb781047277b260308760a33
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:32:55 2010 +0000
softpipe: transfer flush
commit a0543b13c042e3c1142522d9d136f16fd4cabf78
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:32:13 2010 +0000
util: noop implementation of transfer_flush_region
commit ce418533be752dbeb164e7ff82a99483048e482b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:26:07 2010 +0000
gallium: softpipe runs gears with pipe_resources
commit bfda4f2eb34498e4b7f3c608d30fccff6bb9651b
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 11:25:48 2010 +0000
util: get clip_tile working again
commit f5ef219c3bed62b6a0da842e675fae16268e0fbe
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 09:43:20 2010 +0000
softpipe: use u_transfer helpers
commit 072957aab25affecf0702e925310e46c694a5ee4
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 09:42:46 2010 +0000
util: helpers for inline transfers
commit 9c45561fb0d7a52400093bcb2ce5f727fafd7777
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 09:42:25 2010 +0000
util: fix typo calculating transfer box
commit f3e98fd47f36804d019a684d49ff230df3ab0cf5
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 09:25:46 2010 +0000
st/vega: convert to pipe_resource
commit d1b7b00afc944f6499c83d676c7642115d62a62c
Author: Keith Whitwell <keithw@vmware.com>
Date: Sun Mar 14 08:37:56 2010 +0000
gallium: begin converting drivers to pipe_resource
Work in progress...
commit 51c25117f5d6da1926a2be5ecc66677952a8abf0
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 13 20:16:27 2010 +0000
gallium: work in progress on layering resources on top of old textures
Helper code in an aux module to avoid rewriting all the drivers.
commit fb6764d3ce95c55aa78af2f1c8cbb17b79ce1ba2
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 13 19:19:09 2010 +0000
heaps of wip
commit ee6b3bc730fcdaf8da3646d62f04578ec06d36a1
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 13 16:38:02 2010 +0000
wip2
commit 1830880212445189fe267d615075239ed17c7cc0
Merge: 90b4045 47bfbd4
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 13 15:14:03 2010 +0000
Merge branch 'gallium-sampler-view' into gallium-resources
Conflicts:
src/gallium/include/pipe/p_context.h
src/mesa/state_tracker/st_atom_texture.c
src/mesa/state_tracker/st_cb_bitmap.c
src/mesa/state_tracker/st_cb_drawpixels.c
src/mesa/state_tracker/st_cb_texture.c
src/mesa/state_tracker/st_context.c
src/mesa/state_tracker/st_context.h
src/mesa/state_tracker/st_texture.h
commit 90b4045fbc0a093fcd04efba7e045ec259c490b8
Author: Keith Whitwell <keithw@vmware.com>
Date: Sat Mar 13 14:52:43 2010 +0000
wip
We've supported indirect rendering pbuffers for a while, but not direct
rendering pbuffers. The way we do this is by creating a hidden pixmap
and wrap that in a GLX pbuffer. This only works when we have DRI2 on
the server, but if the server doesn't have DRI2, it won't expose configs
with pbuffer bits enabled.
When matching attributes using the 'mask' matching criteria, the spec
says that
"Only GLXFBConfigs for which the set bits of attribute include all
the bits that are set in the requested value are
considered. (Additional bits might be set in the attribute)."
The current test returns true if the two bit masks have bits in
common, specifically it matches even if the requested value has bits
set that are not set in the fbconfig attribute. For example, an
application asking for
GLX_DRAWABLE_TYPE, GLX_PIXMAP_BIT | GLX_PBUFFER_BIT,
as glxpbdemo does, will match fbconfigs that don't support pbuffer
rendering, as long as they support pixmap rendering.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Update the warranty disclaimer to use the more general "THE AUTHORS OR
COPYRIGHT HOLDERS". This is done manually on files created by me. Hope
that I do not miss anything.
we were only using the jited function in the linear case, now drawelts
correctly uses the same path. it results in a significant gain in
real world apps (openarena went from 23fps to 29fps)
Previously, generating inlined function bodies was going to be
difficult, as there was no mapping between the body's declaration of
variables where parameter values were supposed to live and the
parameter variables that a caller would use in paramater setup.
Presumably this also have been a problem for actual codegen.
Using '#extension foo: warn' instructs the compiler to generate a
warning when some feature of the extension 'foo' is used. This patch
adds some infrastructure needed to support that for variables.
Similar changes will be needed for types and built-in functions.
our code resets pipe_vertex_buffer's with different offsets when rendering
vbo, meaning that we kept creating insane number of shaders even for simple
apps e.g. geartrain had 54 shaders and it was taking almost 27 seconds just to
compile them. this patch passes pipe_vertex_buffer's to the jit function and lets
it to the stride/buffer_offset computation at run time. the slowdown at runtime
is largely unnoticable but the we go from 54 shaders to 3, and from 27 seconds to less
than 1.
Previously the same code was generated for a while loop and a do-while
loop. This pulls the code that generates the conditional break into a
separate method. This method is called either at the beginning or the
end depending on the loop type.
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Specifically, handle 'break' and 'continue' inside loops.
This causes the following tests to pass:
glslparsertest/shaders/break.frag
glslparsertest/shaders/continue.frag
This causes the following tests to pass:
glslparsertest/shaders/dowhile.frag
glslparsertest/shaders/while.frag
glslparsertest/shaders/while1.frag
glslparsertest/shaders/while2.frag
I'm not entirely convinced we want this behaviour (the underlying nouveau_bo
doesn't support it either), but since certain parts of the mesa state
tracker appear to require it lets make it work for now.
we were trying to store the outputs starting at the same offset we
were using for the input arrays, which was writing beyond the end of
the output array.
the loop was doing a NE comparison which we could have skipped if the prim
was triangles (3 verts) and our step was 4 verts. also fix offsets in conversion
to aos.
All public functions in the archives are either directly referenced or
indirectly referenced by _glapi_get_proc_address. There is no need for
--whole-archive.
When an unsized array is accessed with a constant extension index this
sets a lower bound on the allowable sizes. When the unsized array
gets a size set due to a whole-array assignment, this size must be at
least as large as the implied lower bound.
This causes the following tests to pass:
glslparsertest/glsl2/array-16.vert
This causes the following tests to pass:
glslparsertest/glsl2/matrix-11.vert
glslparsertest/glsl2/matrix-12.vert
glslparsertest/shaders/CorrectParse2.vert
glslparsertest/shaders/CorrectSwizzle2.frag
This splits rendering into two passes when front and back stencil
reference value, value mask, or write mask don't match.
The advantages of doing it in the driver instead of in st are:
* SWTCL is executed just once and the resulting vertex buffer is reused
in the second pass.
* Lower driver overhead due to the fallback being very close to
the actual draw emission with minimum state change.
As per Radeon 9700 Opengl Programming and Optimization Guide [1], there are
16 texture units even on the first r300 chipsets. If you think I am wrong,
feel free to propose a patch.
[1] Here's PDF: http://people.freedesktop.org/~mareko/
The tiling setup needs a bit of work, but this should be good enough for now,
when we get buffers from the kernel we need to store their tiling properties.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In GLSL 1.10 this was not allowed, but in GLSL 1.20 and later it is.
This causes the following tests to pass:
glslparsertest/glsl2/array-09.vert
glslparsertest/glsl2/array-13.vert
Whole arrays are assignable in GLSL 1.20 and later, but it's not clear
how to handle that within the IR because the IR is supposed to be
shading language version agnostic.
there's no good way of aligning the output's, and since the vertex_header
is variable sized in the first place we need to extract elements from a vector
and store them individually into an array. this gets the basic examples working
again
With an Intel 855GM handled by intel_drv, there's a crash with Gallium3D
enabled DRI driver for Intel i915 (--enable-gallium-intel).
The Gallium3D driver doesn't support the 855GM as expected by
intel_drv, it failed to open the screen and give an half
initialized screen structure to dri_destroy_option_cache():
optionCache.info is NULL, so it's crashing while trying
to free array content. This patch at least fix the crash in the function.
Here's some logs of the fixed version:
[ 16274.137] LoaderOpen(/opt/mesa/lib/xorg/modules/drivers/intel_drv.so)
[ 16274.139] (II) Loading /opt/mesa/lib/xorg/modules/drivers/intel_drv.so
[ 16274.183] (II) Module intel: vendor="X.Org Foundation"
[ 16274.183] compiled for 1.8.0, module version = 2.11.0
[ 16274.183] Module class: X.Org Video Driver
[ 16274.183] ABI class: X.Org Video Driver, version 7.0
[ 16274.183] (II) intel: Driver for Intel Integrated Graphics Chipsets: i810,
i810-dc100, i810e, i815, i830M, 845G, 852GM/855GM, 865G, 915G,
E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM, Pineview G,
965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33, GM45,
4 Series, G45/G43, Q45/Q43, G41, B43, Clarkdale, Arrandale
[ 16274.382] (II) intel(0): Integrated Graphics Chipset: Intel(R) 855GME
[ 16274.382] (--) intel(0): Chipset: "852GM/855GM"
[ 16276.675] (II) intel(0): [DRI2] Setup complete
[ 16276.675] (II) intel(0): [DRI2] DRI driver: i915
debug_get_option: GALLIUM_TRACE = (null)
debug_get_bool_option: GALLIUM_RBUG = FALSE
debug_get_bool_option: INTEL_DUMP_CMD = FALSE
i915_create_screen: unknown pci id 0x3582, cannot create screen
dri_init_screen_helper: failed to create pipe_screen
[ 16276.794] (EE) AIGLX error: Calling driver entry point failed
[ 16276.794] (EE) AIGLX: reverting to software rendering
[ 16276.794] (II) AIGLX: Screen 0 is not DRI capable
[ 16276.796] (II) AIGLX: Loaded and initialized /opt/mesa/lib/dri/swrast_dri.so
[ 16276.796] (II) GLX: Initialized DRISWRAST GL provider for screen 0
Signed-off-by: Yann Droneaud <yann@droneaud.fr>
Reviewed-by: Corbin Simpson <MostAwesomeDude@gmail.com>
MAXWIDTH/HEIGHT were 2048 but the max texture size was 4096.
This caused a crash if a 4Kx4K texture was created and rendered to.
See comment about max framebuffer size in lp_scene.h.
Also added assertions to catch this inconsistancy in the future.
Putting calls to util_format_init all over the codebase is infeasible.
Instead, half float tables are pregenerated, and the s3tc library is
loaded on demand.
I believe this is a solution that combines performance, cleanliness,
flexibility and portability.
This changes the S3TC function pointers to be initialized to stubs
that load the S3TC library and then delegate to the real functions.
If the S3TC library fails to load, the function pointers are replaced
with a "nop" function.
The code is also changed to attempt to load the library only one time.c
Note that unlike checking for a flag, this method has no performance
cost at all.
The use of the "nop" functions also allows to avoid most checks, that
are only preserved when the function does non-trivial work.
This solution avoids the issue of how to run the initializers and
also allows those pages (and the parts of them in processor caches)
to be shared between multiple processes.
The drawback is slightly higher library size.
For unsized arrays, we can't flag out-of-bounds accesses until the
array is redeclared with a size. Track the maximum accessed element
and generate an error if the declaration specifies a size that would
cause that access to be out-of-bounds.
This causes the following tests to pass:
glslparsertest/shaders/array10.frag
Switch from auto-init to explicit init for util_half per Brian Paul's
indication.
NOTE: this is probably broken because not enough things call util_format_init.
Will be fixed shortly
They apparently both declare the section, but #pragma data_seg
also puts all subsequent definitions in the section, which is
undesirable.
This should be the correct solution, and is actually used by the
reference I cited (but I forgot to do it in my code).
Untested, let me know if it doesn't work.
The Xlib demos were fixed to use $(X11_LIBS) so that configure could
detect the proper directory to link the library from, but this broke
the non-autoconf builds. Give X11_LIBS a default value to fallback on.
NOTE: this commit will cause Gallium to fail to build on any compiler
except GCC, the Microsoft C compiler and compatible compilers that
claim to be one of those.
This commit removes the u_gctors.cpp mechanism, in favor of using
compiler-specific syntax to add global constructors from C files.
This solves the problem of u_gctors.o not being pulled from static
libraries and avoids using C++.
However, it needs compiler-specific support for every compiler.
The Microsoft C compiler support has not been tested.
Reflect the recent addtion of eglut and reorganization of the EGL demos.
This helps remove ~2k lines of duplicated code
$ git diff --shortstat -M 57cc1db87b
18 files changed, 298 insertions(+), 2178 deletions(-)
This adds a fast half float conversion facility to Gallium.
Mesa already contains such a facility, but using a much worse algorithm.
This one is an implementation of
www.fox-toolkit.org/ftp/fasthalffloatconversion.pdf
and uses a branch-less algorithm with some lookup tables small enough
to fit in the L1 cache.
Ideally, Mesa should start using these functions too, but I'm not sure
how to arrange that with the current build system.
A new "u_gctors.cpp" is added that defines a global C++ constructor
allowing to initialize to conversion lookup tables at library init.
I haven't verified that these are all correct, but it's still a lot
better than not having anything.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
ast_function::hir consists of bits pulled out of
ast_function_definition::hir. In fact, the later uses the former to
do a lot of its processing. Several class private data fields were
added to ast_function to facilitate communicate between the two.
This causes the following tests to pass:
glslparsertest/shaders/CorrectModule.frag
This causes the following tests to fail. These shaders were
previously failing to compile, but they were all failing for the wrong
reasons.
glslparsertest/shaders/function9.frag
glslparsertest/shaders/function10.frag
Several other code movements were also done. This partitions this
function into two halves. The first half processes the prototype
part, and the second have processes the actual function definition.
The coming patch series will parition ast_function_definition::hir
into (at least) two separate functions.
hash_table_insert needs to keep the key so that it compare keys on a
following hash_table_find call. Since key was allocated on the stack,
it disappeared out from under the hash table.
This causes the following tests to pass:
glslparsertest/shaders/array4.frag
glslparsertest/shaders/array5.frag
This causes the following tests to fail. These shaders were
previously failing to compile, but they were all failing for the wrong
reasons.
glslparsertest/shaders/array3.frag
Add FEATURE_GL, FEATURE_ES1, and FEATURE_ES2 for OpenGL, OpenGL ES 1.x,
and OpenGL ES 2.x respectively. Define individual features through the
new umbrella features. There is no real change introduced by this
commit.
There are X visuals that Gallium or the code does not support. We could
not assert the color format to be supported. Return PIPE_FORMAT_NONE in
such cases and let the caller handle it.
This causes the following tests to pass:
glslparsertest/shaders/attribute.vert
glslparsertest/shaders/attribute1.vert
glslparsertest/shaders/attribute2.vert
they won't generate any useful conversion code for some of the new formats
but at least don't assert. Also needed some more hacks so they don't generate
code for some of the new formats, as gcc was not impressed.
Also declare unused channels as void, and change the scripts to not fail if
the first channel happened to be unused.
Needs serious fixing.
I'm not sure if this is a win or not. It makes the code in
apply_implicit_conversion more clear, but it obscures the fact that it
may change the pointers.
This causes the following tests to pass:
glslparsertest/shaders/CorrectVersion.V110.frag
shaders/glsl-vs-sqrt-zero.frag
shaders/glsl-vs-sqrt-zero.vert
This causes the following tests to fail. These shaders were
previously failing to compile, but they were all failing for the wrong
reasons.
glslparsertest/shaders/attribute1.vert
glslparsertest/shaders/attribute2.vert
glslparsertest/shaders/main2.vert
The following tests now pass:
glslparsertest/shaders/if1.frag
glslparsertest/shaders/if2.frag
The following tests that used to pass now fail. It appears that most
of these fail because ast_nequal and ast_equal are not converted to HIR.
shaders/glsl-unused-varying.frag
shaders/glsl-fs-sqrt-branch.frag
fsraytrace and vsraytrace were crashing on Mac OS X 10.6.3 in the Apple
GLSL compiler function TPPStreamCompiler::assignOperands. Removing some
const qualifers made the crashes go away.
Making the base e functions IR operations is not a clear win. i965
doesn't support it, it doesn't look like r600 supports it, but r500
does. It should be easily supportable as a lowering pass, though.
Following a discussion in #dri-devel, I think this makes more sense
than implementing it as RSQ RCP CMP as Mesa did. The i965 has a
hardware sqrt that should work, and AMD is suppposed to be able to
implement it as RSQ RCP with an alternate floating point mode so that
the 0.0 case is handled like we want.
These are needed for DX10 and/or OGL3.3.
This just adds the formats nothing handles them yet.
PIPE_FORMAT_R1_UNORM can't be used currently as it requires special filter.
Need to reclassify compressed formats at some point.
Per a patch from Marek Olšák, we can simply multiply the incoming
value by 1/sqrt(x) instead of using rcp.
We're keeping the x==0 check to avoid generating NaN for sqrt(0).
Commit db5c2235d1 broke the default SCons
build.
NameError: name 'graw' is not defined:
This patch allows the default SCons build to work again until a proper
fix is available.
Provides basic window system integration behind a simple interface,
allowing tests to be written without dependency on either the driver
or window system.
With a lot of work, could turn into something like glut for gallium.
As a result, the following tests pass:
glslparsertest/array3.frag
glslparsertest/CGStandardLibrary.frag
glslparsertest/ConstantConversions.frag
glslparsertest/constructor1.frag
glslparsertest/constructor2.frag
glslparsertest/constructor3.V110.frag
glslparsertest/dataType4.frag
glslparsertest/dataType5.frag
glslparsertest/dataType13.frag
glslparsertest/dataType19.frag
glslparsertest/matrix.V110.frag
glslparsertest/parser7.frag
glslparsertest/swizzle3.frag
The following tests also pass, but it is just by dumb luck. In these
cases the shader fails to compile, but it fails for the wrong reason:
glslparsertest/array6.frag
glslparsertest/comma2.frag
glslparsertest/conditional1.frag
glslparsertest/conditional2.frag
glslparsertest/conditional3.frag
glslparsertest/constFunc.frag
glslparsertest/ParseTest3.frag
glslparsertest/ParseTest4.frag
glslparsertest/varying3.frag
glslparsertest/parser8.frag (also segfaults)
glslparsertest/parser9.frag (also segfaults)
The following tests now fail. As far as I can tell, these are all
cases where the shader was failing to compile, but it was failing for
the wrong reason.
glslparsertest/CorrectMatComma.frag
glslparsertest/CorrectModule.frag
glslparsertest/CorrectSwizzle2.vert
glslparsertest/shaders/glsl-fs-bug25902.frag
There is no effective changes given how the function is called. It is
still not trivial, but it should be more readable and resemble
_eglBindContextToThread a lot.
When a newly bound context is the same as the previously bound one,
_eglBindContextToThread should still return the context instead of NULL.
This gives the driver a chance to flush the context.
ximage backend calls gallium_wrap_screen, which requires libidentity.a
and libtrace.a. There are also some missing symbols in i965 due to the
use of sw wrapper.
All of the scalar, vector, and matrix constructors *except* "from
bool" constructors should be handled. Array and structure
constructors are also not yet handled.
So other directory can share it.
Also remove the libllvmpipe.a dependency from test
programs. It is not needed any more.
Signed-Off-By: Christopher Li <chrisl@vmware.com>
For numeric types, vector_elements and matrix_columns must be at least
1. Previously matrix_columns was 0 for vectors, and both were 0 for
scalars. This change simplifies things in some places.
Also turn generate_swizzle into a static "create" method of the new
class; we'll want to use it for the IR reader as well.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This function should be put in targets/common or winsys/sw/common and shared
with targers/libgl-xlib and winsys/sw/drm.
For targets/common, you get layering violations in the build system unless
all of drm_api's are moved under targets.
Only a few state vars require state validation before querying them.
This potentially speeds up state queries.
Encode that info into the state tuple table.
Also, use the new tuple field to indicate when FLUSH_CURRENT() must
be called to validate other state vars.
Based on a patch submitted by Robert Bragg on Feb 12, 2010.
When building more then one dri driver we would get warnings because
we where defining the same build target multiple times.
Also move all the dri scons targets related code into its own file.
Also make a slight change to ir_variable. The ir_dereference tracks
the number of nested dereferences. If an ir_variable is visited and
the count is non-zero, just print the name of the variable.
Basically, replace everything different from operator_assign other
than the creation of the rhs value from the lvalue and rvalue with the
contents of operator_assign. Fixes a segfault in
CorrectSwizzle1.frag, and fixes parser10.frag.
We shouldn't try to clear a non-existant z/stencil buffer, so there's
probably a bug elsewhere. Disable the assertion for now to allow things
to at least run.
Newb GL mistake: matrices in GL are column-major. This means that
vector_elements is the number of rows. Making these changes causes
matrix-08.glsl to pass.
This simplifies the process of matching function parameter types.
More simplifications are probably possible here, but arrays and
structures need to be implemented first.
The only way the specified type fields can match is if the types are the
same. Previous tests (and assertions) have filtered away all other possible
cases.
We weren't saving the per-scene texture references at the right point.
Fixes piglit cubemap segfault. The segfault resulted from referencing
texture memory which was prematurely freed because of a missed reference
count.
Fixes fd.o bug 27276.
Since texture's can be shared by many contexts, the texture's sampler
view's context pointer might be invalid by time we delete the texture.
Prevent crashes/etc by setting the sampler view's context to be the
calling context before deleting it. This should be safe as long as
all contexts which share the texture are using the same gallium driver.
That's a reasonable assumption since pipe_texture objects aren't
compatible between different drivers anyway.
We still don't do proper min/mag filtering but this is better than just
sampling the base mipmap level all the time.
Fixes piglit depth-level-clamp test. Fixes fd.o bug 27256.
This is a quick & dirty solution, but it works. See comments in
the code for other ideas.
Fixes regressions/breakage seen in progs/xdemos/glxheads, etc. from commit
6632915e95.
Currently the test link uses -lGL to define the glapi symbols.
This makes it impossible to build DRI drivers on systems without
Mesa installed and without building the libGL from the Mesa tree
first.
Some automated build systems trigger this problem.
This commit removes -lGL and instead adds a dummy implementation of
glapi to dri_test.c
This, along with Kristian's commit, should fix all known regressions
due to the addition of unresolved symbol checking.
The variable X_LIBS from AC_PATH_XTRA contains only the -L searchdir
parameter and not the -lX11 to link to Xlib. Use X11 prefixed build vars
for linking with Xlib to avoid the conflict.
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
The two places that were still passing NULL had a state pointer to
pass. Not passing it in these places prevented termination of
compilation of erroneous programs.
We're still abusing the flags by putting them where our driver stores
ctx->NewState changes. Making them into more restricted state change
flags would be a project for later.
Fixes a failure where calling intel_draw_buffer() too often would trip
up Mesa assertions about when Mesa state could get changed, when it hadn't.
Bug #27034.
This will make void-01.glsl test fail, so I may regret this later.
However, this will make supporting functions that return void or
functions that have a void parameter list easier to handle.
On some systems, putting vertex and index buffers in VRAM instead of GART
memory eliminates massive graphics corruption which is otherwise present,
due to unclear causes.
This patch adds an environment variable that does that, along with helpful
messages.
It turns it on by default on G7x, as it is what I am seeing corruption
on and some other reports also seemed to pinpoint these cards.
Currently we allocate buffers in GART or VRAM at creation time.
However, when using swtnl, this results in reads from uncached
memory, which drastically impair performance.
So, for now, cause nouveau_screen.c to not pass any placement flags
to buffer creation, so that the buffers are moved later.
Previously libdrm itself did this, but was changed to not to do it.
This may introduce an extra copy in normal usage, but this currently
does not seem to introduce significant performance degradation.
This will be revisited when pipebuffer is integrated.
Note that for AGP systems, properly solving this may be complex
since currently there is no fast way of reading from GART/VRAM.
We will probably need to try mapping AGP as writethrough and, in
addition, make buffer creation more aware of future buffer usage.
Currently we are continuously spewing messages about these variables
since we call debug_get_bool_option everytime we want to check their value.
This is annoying, slows things down due to terminal rerendering
and obscures useful messages.
This patch only calls debug_get_bool_option once and caches the result in a
static variable.
This patch re-emits the viewport state on framebuffer or rasterizer
change.
This seems to be necessary on nv3x, but the reason is not fully
understood.
It is quite likely that this isn't really the correct fix, but seems
to work, and makes nv3x much better.
This is a different approach to solving this problem that the patch
I previously posted, and unlike that, should not cause any problems.
Right now undefined symbols in DRI drivers will still allow the
build to succeed.
As a result, people modifying drivers they cannot test risk creating
unloadable drivers with no easy way of automatically avoiding it.
For instance, the modifications to nv50 for context transfers caused
such an issue recently.
Unfortunately, just adding -Wl,--no-undefined doesn't work, because
the DRI drivers depend on glapi symbols, but do not depend on
libGL.so.1
Adding -lGL is not the correct solution since DRI drivers are not loaded
just by libGL, but also by X and possibly by other clients.
So, this patch simply tries to build an executable linked to the DRI
driver and to libGL.
If the DRI driver contains any undefined symbols not satisfied by its
dependencies or by libGL, this will fail.
This solution does not alter the built drivers, and does not significantly
slow down the build process.
All classic DRI drivers as well as all the Gallium drivers with configure
options compiled successfully with this change.
Thanks to Xavier Chantry <chantry.xavier@gmail.com> and
Michel Daenzer <michel@daenzer.net> for helping with this.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Acked-by: Brian Paul <brian.e.paul@gmail.com>
This can happen when an X window is destroyed behind our back. We use
DRI2CopyRegion behind the scenes in many places (like flushing the fake
front to the real front) so we have to ignore X errors triggered in that
case.
The glean test cases trigger this consistently as they don't destroy the
GLX drawable nicely, they just destroy the X window.
This change passes a remainder of 1 to the server with the
DRI2SwapBuffers request, causing it to honor the OML semantics for the
swap rather than falling through to glXSwapBuffers behavior. The
remainder actually ends up ignored since the divisor is 0, but we need
to differentiate the OML and standard behavior somehow.
Reported-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It appears that the thing that was killing VS threads was the
gratuitous NOP that replaced the gratuitous jump from OPCODE_END to
the nearby OPCODE_END implementation. With that gone, we can move on
to the rest of the pipeline.
Just emit the URB write at END time. Subroutine code that sits after
OPCODE_END won't be executed since we've ended the thread at the point
that the URB write is done.
The hope is to later take advantage of the reduced constant usage to
free up regs. This only covers the GLSL path at the moment, because
the brw_wm_emit path doesn't get the information as to whether a float
value is a constant or a uniform.
This fixes a pretty big performance regression caused by commit
3475e88442.
When the user does not request a stencil buffer it's important that we
don't use a depth/stencil format (or at least make it our last choice).
If the user calls glClear(GL_DEPTH_BUFFER_BIT) when we have a combined
depth/stencil buffer, that causes us to hit the clear_with_quad() path
which can be much, much slower than calling pipe_context::clear().
Also, try to use a shallower depth format before a deeper one.
Relocations of per-screen buffers are now emitted directly,
and include the necessary method to get changes in constbuf
addresses committed to the hw.
It should also be a bit cheaper than the way stateobjs emit
relocation markers, use a little less pushbuf space.
__NOT_HAVE_DRM_H is a like a feature, defined by default on specific platforms
and allows to be defined externally as well.
__NOT_HAVE_DRM_H should only be used by xserver and mesa swrast_dri drivers
Okay need to revist the whole OQ stuff anyways, glean test asserts
which is never good.
I'm liking the cached bufmgr restrictions less and less, I think I'll
probably play with the fence and/or busy stuff ASAP and try and clean it up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Suggested by Jose on the list, probably not perfect but will let me get
past this for now, testing with a fenced bufmgr on top of this, was slower,
Also this doesn't let you do the busy early exit optimisation either from
what I can see.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we change the tiling on a buffer we need to flush it, the old
radeon_buffer.c code had this but it crossed streams when I ported to
radeon_drm_buffer.c and I missed it. Should fix some piglit regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will be used to prevent a variable and a function with the same
name from being declared. As a side effect, the calls to
add_{type,name,function} should never fail.
This will allow types and their constructors to be easily stored in
the same symbol table. This does add a potential problem that a
shader could declare a variable and a function with the same name.
This appears to be forbidden by the GLSL spec.
This implementation was wrong anyway because it did not respect
scoping rules. This will need to be revisited soon. The most likely
result is that the grammar is going to need some significant re-work
to be able to use a IDENTIFIER in all the places where a TYPE_NAME is
currently used.
Refactor the code into two helper functions which compute the bit mask
and shift terms for Z and stencil. Plus add a bunch of new comments to
explain everything.
use cso fragment sampler views instead of sampler textures.
since we don't really change views, try to store sampler views instead
of the textures to avoid having to recreate views most of the time.
The triangle rasterizer sets this field to indicate front/back-facing.
It gets passed into the generated fragment code as another parameter.
Used now for stencil front/back selection but will also be used for
fragment shaders in general (see TGSI_SEMANTIC_FACE).
With this commit two-sided stenciling mostly works but there's
still a bug or two...
Swizzling needs the destination surface in VRAM, but the subsequent
rendering operations making use of it are likely to not care. Fire the
ring after validation to leave the memory manager more room for
maneuvering.
nouveau reallocated the mipmap tree on every MIN_FILTER call to account
for mipmap change. We only need to do this if the texture does not fit
in the existing mipmap tree. This gives a big performance boost for a
game like bzflag which changes MIN_FILTER all the time for its font
rendering.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
The default viewport is the window rectangle, which is set up by
_mesa_make_current(). To be able to do that we need to get the
window dimension (and buffers) first, so we have to call
intel_prepare_render() before we can call into _mesa_make_current().
Fixes#26676 and #26678.
Most stencil demos look OK (modulo some unrelated rendering glitches).
Only single-sided stencil test works at this point.
There are probably some bugs to be found...
The pitch is not really an inherent part of the miptree, since it's
not part of any of the layout calculations, and it's dictated by the
libdrm-allocated region pitch now.
The primary consumer of this (miptree relayout) already has this code
for handling failure, and the other paths want to know if failure
actually occurs and do something appropriate, which may not include
memcpy.
If the test image was larger than the window, nothing was drawn because
of invalid raster position. Use glWindowPos instead of glRasterPos.
Also, use integer src/dst coordinates to avoid grabbing black pixels
outside of the src image region.
Fixes corrupted fonts in bzFlag, where we've been silently
failing to copy I8 mipmaps to a new miptree.
Print an error message on unsupported format now, since we
can't return failure.
The llvm wrapper wasn't really an OS thing.
Use lp_bld.h for now but we eventually should rename/re-prefix all the
files/functions in the gallivm/ directory.
Add a constructor that uses an ast_struct_specifier and one that uses
a type name. This saves a (trivial) bit of code, but it also ensures
some of the class invariants (i.e., type_name != NULL) are met.
For built-in types, type_name would be NULL. This ensures that
type_name is set even for the built-in types. This simplifies code in
a few places and centralizes the name setting code.
When min_lod==max_lod we don't need to go through all the work of
computing the lod from partial derivatives. This is hit by the mipmap
generation utility code.
LLVMBuildFPTrunc() should be used for double->float conversion, not
float->int conversion.
There should be a better way to compute floor(), ceil(), etc that doesn't
involve float->int->float conversion.
This creates a cleaner winsys and drop the simple screen stuff.
It makes r300g use pb_bufmgr structs.
It also tries to avoid overheads from mapping too often.
v5: clean warnings
v6: break out of cache check on first buffer - since most likely
the first busy one implies all after it are busy.
v7: cleanup a bit
v8-merged: drop cman for now to just get all the interface changes in first.
rework to changes that happened upstream
Signed-off-by: Dave Airlie <airlied@redhat.com>
src_native_swz was used to translate 0/1 swizzles back when Gallium
supported them.
That support was later removed from Gallium, and the function currently
always returns true.
Remove it.
Currently the behavior of shader.h depends on some constants that
are defined differently in vertex and fragment programs.
This patch cleans that up by splitting the relevant symbols in
vertex program and fragment program variants
We must divide everything in the position by w, and emit position as
a 4-component vector.
Not sure why we must divide, but it works (see progs/redbook/checker).
The adjustment of nv30/nv40 after the removal of bypass incorrectly
removed the hardware viewport bypass code, which we still need for
swtnl and also forgot to remove NVFX_NEW_RAST from pipe.
This is the last nvfx unification patch.
nv[34]0_fragtex.c are moved to the common directory
nv[34]0_shader.h are renamed to nv[34]0_vertprog.h and moved to
the common directory
The separate nv30 and nv40 directories are removed from the build
system
vertprog.c is similar but has substantial differences:
1. nv40 supports clip planes
2. nv40 uses a more advanced register allocator
3. Some register setup is different
4. Constants with the same name have different values
This patch unifies the two files.
nv30 gains clip plane support and the nv40 register allocator.
A new NVFX_VP(x) macro is introduced that at runtime resolved to
either the nv30 or the nv40 constant value.
nv30 clip planes are not tested and might not work
state.c is identical except for:
1. Sampler state creation is different
2. nv40 swtnl support
3. Separate blend equations on nv40
This patch unifies nv[34]0_state.c, except the sampler state creation code.
nv30_draw.c is a stub.
This patch makes both nv30 and nv40 use the nv40 swtnl path.
Note that this doesn't actually work on nv30 because the vertex program is
encoded in the nv40-only layout.
However, swtnl was unimplemented before on nv30, so this is not a regression.
Furthermore, a patch to fix this is available near the end of the patchset.
The files are mostly the same except:
1. On NV40, some TGSI instructions are emulated with several hardware ones
2. Some instructions such as DDX/DDY, and STR were missing from nv30
3. NV40 has more sophisticated register management
nv30 now supports all instructions and uses the nv40 register management.
shader.h is similar, except for the following differences:
1. The instruction sets are not exactly the same, but mostly similar
2. Vertex program fields are in different bit positions
This patch unifies all parts of nv[34]0_shader.h except the vertex
program fields.
Vertex opcodes are also changed so that the constant names includes
SCA if it is a scalar opcode and VEC if it is a vector opcode.
The files are significantly different due to:
1. nv30 support 2 render targets, nv40 4
2. z-buffer pitch is set differently
3. nv30 has a limitation of colour_bits >= zeta_bits. This may not
actually exist in the driver though
4. nv30 points color0 at depth in the depth-only case
5. nv30 sets NV34TCL_VIEWPORT_TX_ORIGIN to 0. This is probably
unnecessary
This patch attempts to unify the two files and preserve the existing
behavior.
The only difference between nv30 and nv40 is that nv30 allowed swizzling
for more texture types.
This patch preserves the existing behavior, using conditional code.
Note however that this does not make sense, since all texture types can
be swizzled on nv40 and probably on nv30 too.
However, the handling of swizzled surfaces in the current 2D code is
partially broken, so it's best not to touch this.
A whole rewrite of the 2D code will be submitted, which will solve this
problem.
The files are the same except for swtnl support on nv40 and for
texture cache flushing on nv40.
Unify them, and use a macro to define 4 versions of render_states,
for all combinations of nvfx and hwtnl/swtnl.
Will be used to hold source files unified between nv30 and nv40.
Eventually all nv30 and nv40 code will be moved there and the
nv30 and nv40 directories will be removed.
This patch unifies nv[34]0_screen.h, nv[34]0_context.h and
nv[34]0_state.h
The unified files are put in a new "nvfx" directory.
nv30_context.h and nv40_context.h still exist to hold the function
prototypes and include nvfx_context.h
nv[34]0_screen.h and nv[34]0_state.h are deleted, replaced by the
unified versions.
nv40 includes some extra fields for swtnl and user clip planes
support.
These fields will be unused on nv30 until that functionality gets
added to it too (by unification with nv40).
It was decided to just use the NV34TCL_ constants for constants
common between nv30 and nv40, and deprecate the NV40TCL_ versions.
This patch changes the nv40 driver to use NV34TCL_ constants for
common functionality.
This reduces differences between nv30 and nv40 to ease further
unification.
It seems that x86-64 with tls will fail to compile or load due to a missining
gl_dispatch_functions_start symbol. Not changing though, since this is how it
used to be and cannot test.
Consider this rendering sequence
* render to the back buffer
* swap buffers
* read from the front buffer
The front buffer is expected to have the contents of the back buffer.
This basically adds a static xmesa_display to collect per-display static
variables in xm_api.c. Multiple display support is still missing, but
this is a step forward.
E.g. when mapping buffers could flush CS or cause waiting
for a busy buffer.
The side effect of this is it also fixes progs/demos/arbocclude however
a separate fix should be proposed to address this issue in other cases
it might occur.
Now that transfers are context objects their sideeffects must happen in
order when used by the state tracker, but that synchronization must be
bypassed when used inside the driver, or it would cause infinite
recursion.
The front renderbuffer of a framebuffer is usually added as needed when
glReadBuffer(GL_FRONT) is called. When the call is followed by
glReadPixels, we should validate the state before reading from the
renderbuffer.
When xmesa_st_framebuffer_validate was called twice with different sets
of attachments, the second call was ignored. Add a texture_mask to
remember which textures have been requested to make sure the missing
ones get created.
There are two conditions that a validation is required. One is when the
the framebuffer becomes invalid. The other is when we request for
textures that we did not request before.
The mapping for vertex_array_bgra:
(gl -> st -> translate)
GL_RGBA -> PIPE_FORMAT_R8G8B8A8 (RGBA) -> no swizzle (XYZW)
GL_BGRA -> PIPE_FORMAT_A8R8G8B8 (ARGB) -> ZYXW (BGRA again??)
Iẗ́'s pretty clear that PIPE_FORMAT_A8R8G8B8 here is wrong. This commit
fixes the pipe format and removes obvious workarounds in util/translate.
Tested with: softpipe, llvmpipe, r300g.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
These should always be sanitized before heading towards the pipe driver,
and if the calling function explicitly marked them as invalid, we need
to regenerate them.
Allows r300g to properly pass a bit more of Wine's d3d9 testing without
dropping stuff on the floor.
There is currently no user of this new interface. As the inteface can
coexist with st_public.h, everthing should work as before.
ST_TEXTURE_2D is both defined by st_public.h and st_api.h. Reorder the
headers in st/dri to avoid conflicts.
When the implementation of function_call_header and
function_identifier were changed a few commits ago, the types of the
production changed. This just updates the types specified for the
productions to match reality.
The stride depends on the mipmap level. Rename to row_stride to
distinguish from img_stride for 3D textures.
Fixes incorrect texel addressing in small mipmap levels.
don't enable APPLE_client_storage, TDFX_texture_compression_FXT1,
EXT_cull_vertex, NV_vertex_program, NV_vertex_program1_1 -
the latter two might work somewhat with some luck.
Also don't enable ARB_imaging.
Fixed the state array sizes at 3 (instead of PIPE_SHADER_TYPES)
because we'll never have domain and hull shaders on nv50; also
the numbering doesn't correspond to the hw numbering.
There was very little use for this beyond permitting the
pipe_context::tex_transfer_destroy() function to omit the pipe_context
argument.
This change adds the pipe_context argument into tex_transfer_destroy()
so that it looks like other pipe_context functions, and removes the
pipe_context pointer from pipe_transfer.
We could use single 1 bit conditions for scalar masks, but a lot of code
expects masks. The compiler easily optimzes away masks
extensions/truncations so consistency is preferable.
We can revisit this when LLVM backends have more support for vector
conditions.
commit 7a2ee04629681e59ea147b440856c4f9a33ae9f8
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 14:19:17 2010 +0000
nv: convert to context transfers
commit 188a3f5331c8e5966729fd59d02975afb7324adc
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 14:11:10 2010 +0000
nouveau: remove unused variable
commit 5c8e880ab4dc020358c08728b8adb1637d2dc5bc
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 12:31:21 2010 +0000
mesa/st: fix compilation after merge
commit c552595333f860c2a4807e195596acdf5d6a5ef8
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 12:31:06 2010 +0000
util: fix compilation after merge
commit e80836878a3617b0e350d2a8f92311832a1476cb
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 12:30:47 2010 +0000
r300g: fix compilation after merge
commit 0e4883e9511b9db4e75a4dbc78d7bb970badc15d
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 12:18:45 2010 +0000
i965g: fix incorrect merge
commit 17d74133d8168eebf93bf1390de79930fc8da231
Merge: cb81c79 aa311ae
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 11 12:08:32 2010 +0000
Merge commit 'origin/master' into gallium-context-transfers
Conflicts:
src/gallium/drivers/i965/brw_screen_texture.c
src/gallium/drivers/r300/r300_screen.c
src/gallium/drivers/softpipe/sp_texture.c
src/gallium/drivers/svga/svga_screen_texture.c
src/gallium/state_trackers/egl/x11/native_ximage.c
commit cb81c79098bc3a92a4d2a3dcc0edc972dfb407be
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 2 16:04:01 2010 +0000
egl/x11: hack for context transfers
There is a better approach to this in the winsys-handle branch, but
for now avoid using transfers at all by always allocating our own
backing store directly.
commit f44a24e1d4ad7563f3eedd6b3a5688f53a36265c
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 2 16:03:16 2010 +0000
llvmpipe: context transfers
commit 4d7475ef8104b3b478c7c6ce77cd3506c57e25d1
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 2 16:02:50 2010 +0000
llvmpipe: disable testprogs build
Not working.
commit a9bf98c4d36bd92a76f81e83747eb9b8f0a0515f
Merge: ee0f97e 0c616da
Author: Keith Whitwell <keithw@vmware.com>
Date: Tue Mar 2 15:28:25 2010 +0000
Merge commit 'origin/master' into gallium-context-transfers
Conflicts:
src/mesa/state_tracker/st_cb_accum.c
src/mesa/state_tracker/st_cb_bitmap.c
commit ee0f97e8d9fd5ef57211a8e1268f505c9829e246
Merge: a7f078e 828f545
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Feb 19 13:00:29 2010 +0000
Merge commit 'origin/master' into gallium-context-transfers
Conflicts:
src/gallium/auxiliary/util/u_debug.h
src/gallium/drivers/i915/i915_context.h
src/gallium/drivers/llvmpipe/lp_flush.c
src/gallium/drivers/nv30/nv30_screen.h
src/gallium/drivers/nv40/nv40_context.h
src/gallium/drivers/nv40/nv40_screen.h
src/gallium/drivers/nv50/nv50_context.h
src/gallium/drivers/r300/r300_screen.c
src/gallium/drivers/r300/r300_winsys.h
src/gallium/drivers/softpipe/sp_context.c
src/gallium/drivers/trace/tr_context.c
src/gallium/state_trackers/dri/dri_context.c
src/gallium/state_trackers/egl/common/egl_g3d.c
src/gallium/state_trackers/python/st_device.c
src/gallium/winsys/drm/radeon/core/radeon_drm.c
commit a7f078e16d851b53ef316066dcced46eb39ebe24
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Feb 5 14:16:11 2010 +0000
gallium: move texture transfers to pipe_context
commit 7b2ffc2019d72e833afea7eebf3e80121187375d
Merge: 51e190e c036d13
Author: Keith Whitwell <keithw@vmware.com>
Date: Fri Feb 5 09:55:02 2010 +0000
Merge commit 'origin/master' into gallium-screen-context
Conflicts:
src/gallium/winsys/drm/nouveau/drm/nouveau_drm_api.c
This branch has got a pretty tortured history now, I expect
a squash merge will be appropriate when it is done.
commit 51e190e95acf120f72768fafb29e9721e358df1b
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 17:58:02 2010 +0000
gallium: fix some build issues
commit f524bdaa723fb181637ad30c6ad708aeedabe25b
Merge: f906212 3aba0a2
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 17:51:32 2010 +0000
Merge commit 'origin/master' into gallium-screen-context
commit f9062126883199eabf045714c717cd35f4b7b313
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 17:17:12 2010 +0000
gallium/docs: small description of screen::create_context
commit efcb37bd3d5ed37f06c6105bd2d750b374ec0927
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:42:42 2010 +0000
drm/radeon: more dead create_context wrapper removal
commit 6badc0dd9e06cf2ec936940bcf12b9ef5324b301
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:42:30 2010 +0000
drm/i965: more dead create_context wrapper removal
commit cf04ebd5a54b18b2d894cfdab2b0f2fd55626ffc
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:42:05 2010 +0000
st/python: more dead create_context wrapper removal
commit 444f114c3516abf71c430e6e9d0d2ae3b80679d3
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:37:58 2010 +0000
idenity: wrapped context creation
commit 5a6d09cb9e468d1ee6c8d54b887618819d8d94f2
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:28:47 2010 +0000
ws/gdi: remove dead context_create wrapper
commit 132b55f4bec39386ac625f09aaa11f609664024c
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:27:52 2010 +0000
ws/gdi: remove dead context_create wrapper
commit 56d2d21a0cdcb197a364049d354c2f15a4fc026a
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:25:38 2010 +0000
st/xorg: use screen::context_create
commit 838c5cfe56b2af6c644909bed3c5e7cdd64c336a
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:23:20 2010 +0000
glx/xlib: simplify creation of trace-wrapped contexts
Trace screen knows how to properly wrap context creation in the
wrapped screen, so nothing special to do here.
commit c99404c03ebaec4175f08a2f363e43c9085f2635
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:18:24 2010 +0000
st/python: no need to special case context creation for trace
commit 193a527a682b6877bb1faecd8092df4dfd055a18
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:15:30 2010 +0000
drm/radeon: remove dead create_context declaration
commit bb984eecc25cf23bc77e1c818b81165ba1a07c9a
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:14:58 2010 +0000
nv/drm: remove dead create_context ref
commit e809313a44287dc4e16c28e9e74ef3b2b5271aa1
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:12:51 2010 +0000
st/egl: remove a layer of wrappers around screen::create_context
commit 39caa6672a04122f185c22e17aab86d1c40938bf
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:05:28 2010 +0000
r300g: fill in screen::context_create
commit 407f12556d16ba0656774d8a1d9ebda22f82f473
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 16:04:04 2010 +0000
cell: adapt for screen::create_context, untested
commit d02b0c6ce321a04c76fdabb09e0e4380ce1c1376
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:50:24 2010 +0000
drm/nv: adapt for screen::create_context
All contexts now created directly through the screen, so remove
equivalent code here.
Remove apparently un-needed array of contexts in the winsys.
commit 53eec5b1349aa1b6892a75a7bff7e7530957aeae
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:50:08 2010 +0000
stw: adapt for screen::create_context, untested
commit c6a64de3eb381bc9a88e9fbdecbf87d77925aaf5
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:49:20 2010 +0000
trace: expose the wrapped context's priv data
If we are going to keep this priv idea, really want an accessor
function for it so that trace and other drivers can wrap that.
commit 75d6104e11d86ec2b0749627ed58e35f856ee6eb
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:47:55 2010 +0000
nv30: adapt to screen::context_create
commit 12f5deb6ed9723e9b5d34577052b8365813ca14e
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:44:47 2010 +0000
nv40: adapt to screen::context_create
commit 14baccaa3b6bbb3b91056126f6521828e786dc62
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:35:27 2010 +0000
nv50: adapt to screen::create_context
Not build tested. Need to figure out how to build nouveau.
commit a0e94505ccd2d7f3e604465a2ac302f1286b73b6
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:22:27 2010 +0000
llvmpipe: update for screen::create_context, untested
commit 0eae17107c950346030e4f7e0ec232f868d3893d
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 15:16:57 2010 +0000
xlib/llvmpipe: remove dead winsys context creation path
commit 2f69f9ffaa7e2a01d2483277246ed13051ae4ca3
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 14:58:27 2010 +0000
gallium: convert most code to use screen::create_context
I wish I could build all of gallium at once to find breakages.
commit d7b57f4061b82322cbcae176125913d9f0dea6c1
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 12:46:21 2010 +0000
glx: permit building with older protocol headers
I'd like to be able to build mesa on current distro releases without
having to upgrade from the standard dri2proto and glproto headers. With
this change I'm able to build on ancient releases such as Ubuntu 9-10...
In general, it would be nice to be able to build-test mesa to check for
unintended breakages without having to follow the external dependencies
of every group working on the codebase.
commit 57adedd6fb06c98572ed8d4aef19203df4c4eea2
Merge: da71847 e1906ae
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Feb 4 11:38:15 2010 +0000
Merge commit 'origin/master' into gallium-screen-context
Conflicts:
src/gallium/drivers/softpipe/sp_video_context.h
src/gallium/drivers/trace/tr_context.c
src/gallium/state_trackers/wgl/shared/stw_context.c
src/gallium/winsys/gdi/gdi_softpipe_winsys.c
commit da71847ea6414d7e352c6094f8963bb4eda344dc
Author: José Fonseca <jfonseca@vmware.com>
Date: Sat May 2 08:57:39 2009 +0100
wgl: Use pipe_screen::context_create.
commit 2595a188f93fd903600ef5d8517737ee0592035d
Author: José Fonseca <jfonseca@vmware.com>
Date: Sat May 2 08:56:47 2009 +0100
trace: Implement pipe_screen::context_create.
commit f3640e4ae37a5260cbfba999d079f827de0a313a
Author: José Fonseca <jfonseca@vmware.com>
Date: Sat May 2 08:56:17 2009 +0100
softpipe: Implement pipe_screen::context_create.
commit 347266bddc8bd39c711bacb2193793759d0f3696
Author: José Fonseca <jfonseca@vmware.com>
Date: Sat May 2 08:55:31 2009 +0100
gallium: New pipe_screen::context_create callback.
Extensions were enabled in both st/mesa and st/dri, with st/dri completely
overriding the decisions of st/mesa and exposing even the extensions claimed
to be unsupported by a pipe driver.
This commit moves the differences between the two to st/mesa and removes
the responsibilty of advertising extensions from st/dri.
The new lp_build_sample_general() function will handle all sampling
modes for all texture types. Still incomplete, but a few additional
sampling modes are now supported.
1D textures should work and most of the code for 3D textures is in place.
No support for cube maps yet. No support for different min/mag filters.
Reemitting dirty states on flush causes problems if the GL context
isn't fully consistent when we get to it. It didn't serve any specific
purpose, so, use nouveau_bo_state_emit instead.
Saves an instruction in PINTERP, LINTERP, and PIXEL_W from
brw_wm_glsl.c For non-GLSL it isn't used yet because the deltas have
to be laid out differently.
Otherwise we use glRead/DrawPixels to copy the off-screen FBO image
into the window.
Looks like NVIDIA's broken when using -c (the image is upside down),
but OK with -c -t.
NVIDIA's driver requires that the texture that we're going to render into
be complete. Need to set min/mag filters to non-mipmap modes.
Plus added other error/debug checks.
The bad response length would hang the GPU with a masked sample in a
shader using control flow. For 8-wide, the response length is always
4, and masked slots are just not written to. brw_wm_glsl.c already
allocates registers in the right locations.
Fixes piglit glsl-fs-bug25902 (fd.o bug #25902).
If the user calls glRenderBufferTexture(texture=N) but texture N
doesn't name an existing texture, raise GL_INVALID_ENUM.
Plus, add a comment about some questionable error checking code in
framebuffer_texture(). Ian?
This fixes a problem in glReadPixels when reading from an FBO's texture
attachment. We have a better chance at hitting a fast path for
glReadPixels now.
add_dispatch (driver) and maybe get_proc_address (client) may be called before
set_dispatch is called, which results in generate_entrypoint using an unreloced
function template.
drivers need to handle NULL cso vertex elements (and others) objects.
It is possible the cso code saves/restores NULL objects (if no normal
cso object was bound before this was invoked).
This led to segfaults (for example demos/cubemap) for apps which were using
things like creating mipmaps before drawing anything.
This was a good idea, but ended up tying the build systems in knots.
We can revisit this later, in particular if we can put in place dummy
implementations of cell_create_screen(), llvmpipe_create_screen()
which just return NULL if the driver isn't available.
In the meantime, just duplicate this smallish function in the two
places it was being called.
Components such as state trackers, drivers, etc, should be free to be
recombined in arbtrary ways to build driver stacks. They should not
be reaching out and trying to build the stack themselves - this is now
expected to be handled by the "target" abstraction.
Add a helper gallium_wrap_screen() for injecting the commonly
used extra layers into a gallium stack. Currently that's just the
trace module and identity layer, but there could be more in the
future, eg. a validation layer.
CSO can often be null.
For example:
1. at initialization
2. using an util module (u_blit) right after initialization (it will push
state and pop the previous null state)
3. at shutdown time (state shouldn't be bound when being destroyed)
Glean was hitting 2.
This makes draw_arrays()/draw_arrays_instanced() do the right thing and
not require the (probably broken anyway) flush_notify() usage.
It also fixes a potential bug in the behaviour of reading InstanceID from
shaders, where 0 should be read for non-instanced drawing, previously it
was possible to read non-0 ids if mixing instanced/non-instanced.
Really though, using this at all is just not a good idea in the 3D driver.
I'm almost certain the hardware will not like a reloc appearing between
begin()/end().. Anyways, this is still better than before, more fixes
to come..
Also allows the nv50_state_validate() caller to request a minimum amount
of space that itself requires, not all callers accurately use this yet
but the simple cases are now accounted for.
Rendering will also be dropped on the floor if validate fails now.
The actual assignment is a side-effect of the assignment expression.
Add it to the instruction stream and return the LHS of the assignment
as its rvalue.
The ir_visitor class is the abstract base class for all visitors.
ir_print_visitor contains the beginnings of a concrete visitor class
that will print out an IR sequence in a Lisp / Scheme-like syntax.
We were doing it ad-hoc before, as instructions with potential
aliasing problems were identified. But thanks to swizzling basically
anything can have aliasing, so just do it generally at source reg
setup time. This is somewhat inefficient, because sometimes an
operation doesn't need unaliasing protection if the swizzling is safe,
but the unaliasing before didn't cover those cases either.
Fixes piglit glsl-fs-loop.
We were patching up all the break and continues between the start of
our loop and the end of our loop, even if they were breaks/continues
for an inner loop. Avoiding patching already patched breaks/continues
fixes piglit glsl-vs-loop-nested.
one variable is a bitfield where the rest is never written to, which caused
valgrind to complain. Might have caused cso to not recognize an already stored
state. Reported by Christoph Bumiller.
The LOD is computed from texcoord partial derivatives and used to
select a mipmap level. Still some bugs in texel fetching. Lots of
rough edges and unfinished parts but the basics are in place.
Lots of changes to the lp_bld_arit.c code to support non-vector/scalar
datatypes.
Several software rasterizers can make use of this winsys, but there
isn't any reason why the winsys itself should know about them.
This change moves that information into the libgl-xlib target.
Need to fix up other targets making use of this winsys.
Don't make the shared software winsys rely on internal knowledge about
the cell driver's texture twiddling.
This is just a sketch and hasn't even been compile tested.
While this c99 feature should work with most compilers, valgrind doesn't
really like it, and this only really saves some memory, we don't do this
in similar occasions (like the blend state) neither.
commit f90b3f01af82b9522067b1824e21709a6fb2d3af
Author: Keith Whitwell <keithw@vmware.com>
Date: Mon Mar 8 14:39:44 2010 +0000
gallium: remove p_screen::surface_buffer_create
This isn't very useful without texture_blanket(), which has also been
removed.
Note that this function hasn't been removed from the old pipe_winsys
(u_simple_screen) still used internally by some drivers (eg softpipe).
commit 6c462de39a4b9980a5f034a95e580efdfcb8173b
Author: Keith Whitwell <keithw@vmware.com>
Date: Mon Mar 8 14:27:40 2010 +0000
egl/x11: disable texture_blanket usage
commit b42da9160df9f47224e5b3291b972f41767aa6e5
Merge: 4be2436 3ca9336
Author: Keith Whitwell <keithw@vmware.com>
Date: Mon Mar 8 14:27:24 2010 +0000
Merge commit 'origin/master' into gallium-no-texture-blanket
Conflicts:
src/gallium/drivers/svga/svga_screen_texture.c
commit 4be2436316929e3dfc55bc34d810920c06556b66
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 4 14:59:26 2010 +0000
gallium: remove texture blanket call
No longer needed, except for nouveau and egl/xll/native_ximage.c.
Fix for nouveau is to keep the call, but move it to an internal
function within nouveau.
Fix for that egl/x11 relies on gallium-sw-api branch or its successor.
commit 69b6764330367d63c237d0bde9fb96435d0e0257
Author: Keith Whitwell <keithw@vmware.com>
Date: Thu Mar 4 13:35:16 2010 +0000
drm_api: wrap comment
The use of macros to access existing linked list type makes it
unsuitable for its current use as a base class. Since this type and
the accompanying macros are used all over the place in Mesa, we can't
really change them.
Change the texture data_ptr from just a single image pointer to an
array of image pointers, indexed by mipmap level.
We'll use this for mipmap filtering.
For now, the mipmap level is hard-coded to zero.
The code was walking over the regs of pairs of attributes and checking
whether the attribute with a given reg index had point sprite enabled.
So the point sprite setup code was rarely even getting executed.
Instead, we need to determine which channels of a reg need point
sprite coordinate replacement. In addition, it was multiplying the
attribute by 1/w, when it's supposed to cover (0, 1) in each direction
regardless of w, and it wasn't filling in the Z and W components of
the texcoord as specified.
Fixes piglit point-sprite and the spriteblast demo. Bug #24431, #22245.
Just use flush_frontbuffer directly. The flush_frontbuffer routine has
been somewhat devalued recently, but it is actually just the right
interface for our needs.
It is in pipe_screen, meaning that any wrapping (eg trace module)
will get properly unwrapped before we try and use the pipe_surface
argument for real.
If a particular co-state-tracker needs to implement this itself, it
should organize a way to allow the winsys to call back up to its
level, rather than hijacking the driver-supplied implementation.
Currently there are still at least two functions bundled up inside the
winsys concept:
a) that of a backend resource manager, sometimes capable of performing
present() operations,
b) the initialization code/routine for the whole driver stack.
The inclusion of (b) makes it difficult to share implementations of
(a) between different drivers. For instance, a clean xlib winsys
could be of use for software-rasterized VG, GLES, EGL, etc, stacks.
But that is only true as long as there is no dependency from the
winsys to higher level code, as would be the case when we include (b)
in this component.
This change creates a new gallium/targets subtree, specifically for
implementing the glue needed to build individual driver stacks, and
moves that code out of a single example winsys, namely xlib.
Other drivers continue to build unchanged, but hopefully can migrate
to this structure over time.
Recover some logic to make state canonical, although it is admittedly very
shy compared with what could be done.
We really need an helper module to make state canonical.
Move the initialization of ext_list_first_time from all of the DRI loader's
CreateScreen routines, to where the storage for the screen config is
allocated.
It needs to get set in the screen-config even if DRI is forced off
using LIBGL_ALWAYS_INDIRECT, so that psc->direct_support is initialized
correctly, otherwise __glXExtensionBitIsEnabled() always returns FALSE
Specifically, this causes a problem with an X server which advertises
GLX<=1.2, and the GLX_SGIX_fbconfig extension.
glXGetFBConfigFromVisualSGIX() uses __glXExtensionBitIsEnabled() to
check if the GLX_SGIX_fbconfig extension is available, but that function
won't return correct information because that data has never been
initialized, because ext_list_first_time was never set...
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Promote the llvmpipe winsys more or less unchanged to
state_trackers/sw_winsys.h.
Some minor breakages:
- softpipe::texture_blanket is broken, but scheduled for removal anyway.
- haven't fixed up g3vdl yet.
The interface of util_draw_vertex_buffer looks a bit odd (calling code has to
set vertex elements but not vertex buffers) but due to the way cso state
handling generally works (can't re-bind original vertex element state easily
there) I guess that's ok for now.
Introduce a new shared usage and rename primary to scanout.
The display target usage is more of a windows concept and
doesn't mean the same thing as shared. Display target means
that the surface should be presentable, for softpipe this
means that it should be backed by a hardware buffer.
Instead of having these functions on a side interface like on
drm_api create a opaque winsys_handle that is to be passed down
into the winsys.
Currently the only thing ported to this new interface is drm_api,
and of that only the components that builds by default is ported.
All the drivers and any extra state trackers needs to be ported
before this can go into master.
The ast_expression_bin subclass is used for all binary expressions
such as addition, subtraction, and comparisons. Several other
subclasses are soon to follow.
Known issues in the ARB_color_buffer_float implementation:
- Rendering to multiple render targets, some fixed-point, some floating-point, with FIXED_ONLY fragment clamping and polygon smooth enabled may write incorrect values to the fixed point buffers (depends on spec interpretation)
- For fragment programs with ARB_fog_* options, colors are clamped before fog application regardless of the fragment clamping setting (this depends on spec interpretation)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.