Set the step_rate value when drawing to implement
ARB_instanced_arrays for gen >= 4.
v2:
* leave (total_size < 2048) check where it was to only make
this check once rather than once for each array.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch causes the fragment shader to be configured correctly (and
the correct code to be generated) for centroid interpolation. This
required two changes: brw_compute_barycentric_interp_modes() needs to
determine when centroid barycentric coordinates need to be included in
the pixel shader thread payload, and
fs_visitor::emit_general_interpolation() needs to interpolate using
the correct set of barycentric coordinates.
Fixes piglit tests "EXT_framebuffer_multisample/interpolation {2,4}
centroid-edges" on i965.
Reviewed-by: Eric Anholt <eric@anholt.net>
To save time, we only instruct the clip stage of the pipeline to
compute noperspective barycentric coordinates if those coordinates are
needed by the fragment shader. Previously, we would determine whether
the coordinates were needed by seeing whether the fragment shader used
the BRW_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC interpolation mode.
However, with MSAA, it's possible that the fragment shader might use
BRW_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC instead. In the future,
when we support ARB_sample_shading, it might use
BRW_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC.
This patch modifies the upload_clip_state() functions to check for all
three possible noperspective interpolation modes.
Reviewed-by: Eric Anholt <eric@anholt.net>
This bitfield tells the back-ends which of a fragment shader's inputs
require centroid interpolation. It is only set for GLSL fragment
shaders, since assembly fragment shaders don't support centroid
interpolation.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was only no-oping the clear() function, not actual triangle
rasterization. Move the no_rast field from lp_context down into
lp_rasterizer so it's accessible where it's needed.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes this build failure on Solaris.
Compiling build/sunos-debug/glsl/glcpp/glcpp-lex.c ...
"src/glsl/glcpp/glcpp-lex.l", line 30: cannot find include file: "glcpp-parse.h"
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
$CLANG_RESOURCE_DIR is the directory that contains all resources
needed by clang to compile programs. When clover uses clang to
compile kernels it needs to specify a resource dir, so that clang
can find its internal headers (e.g. stddef.h).
clang defines $CLANG_RESOURCE_DIR as $CLANG_LIBDIR/clang/$CLANG_VERSION
This patch adds the --with-clang-libdir option in order to accommodate
clang intalls to non-standard locations, and it also adds a check
to the configure script to verify that $CLANG_RESOURCE_DIR/include
contains the necessary header files.
On i965, dFdx() and dFdy() are computed by taking advantage of the
fact that each consecutive set of 4 pixels dispatched to the fragment
shader always constitutes a contiguous 2x2 block of pixels in a fixed
arrangement known as a "sub-span". So we calculate dFdx() by taking
the difference between the values computed for the left and right
halves of the sub-span, and we calculate dFdy() by taking the
difference between the values computed for the top and bottom halves
of the sub-span.
However, there's a subtlety when FBOs are in use: since FBOs use a
coordinate system where the origin is at the upper left, and window
system framebuffers use a coordinate system where the origin is at the
lower left, the computation of dFdy() needs to be negated for FBOs.
This patch modifies the fragment shader back-ends to negate the value
of dFdy() when an FBO is in use. It also modifies the code that
populates the program key (brw_wm_populate_key() and
brw_fs_precompile()) so that they always record in the program key
whether we are rendering to an FBO or to a window system framebuffer;
this ensures that the fragment shader will get recompiled when
switching between FBO and non-FBO use.
This will result in unnecessary recompiles of fragment shaders that
don't use dFdy(). To fix that, we will need to adapt the GLSL and
NV_fragment_program front-ends to record whether or not a given shader
uses dFdy(). I plan to implement this in a future patch series; I've
left FIXME comments in the code as a reminder.
Fixes Piglit test "fbo-deriv".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not optimal, but it's better than the register pressure scheduler
that was previously being used. The VLIW scheduler currently ignores
all the complicated instruction groups restrictions and just tries to
fill the instruction groups with as many instructions as possible.
Though, it does know enough not to put two trans only instructions in
the same group.
We are able to ignore the instruction group restrictions in the LLVM
backend, because the finalizer in r600_asm.c will fix any illegal
instruction groups the backend generates.
Enabling the VLIW scheduler improved the run time for a sha1 compute
shader by about 50%. I'm not sure what the impact will be for graphics
shaders. I tested Lightsmark with the VLIW scheduler enabled and the
framerate was about the same, but it might help apps that use really
big shaders.
The rest of the TFB implementation remains in transformfeedback.c, and
this will be shared with UBOs.
v2: Move the size/offset checks shared with UBOs to common code as
well. (Kenneth's review)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix a typo spotted by Eric Anholt.
v3: Fix missing "GL" on types, fix style, fix Studly_Caps extension name,
drop commented code duplicated with GL3x.xml [anholt]
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our intention is still that it's not abi stable, so make the package
version number get included in the library name. Now you can parallel
install dricore-using drivers from multiple mesa versions. We can put
it into lib now that we're following library versioning rules
(assuming that ABIs don't change within a single Mesa point release).
LD_LIBRARY_PATH still doesn't work with a non-/, non-/usr prefix
because libtool uses rpath instead of runpath for nonstandard
prefixes.
The weird versioning of the libGL where the package version was sort
of expressed as a big integer is dropped. libtool didn't like the 0
prefix, and it didn't really make sense anyway -- if you interpret it
as an integer version number, old Mesa 071200 was bigger than current
Mesa 08100. Instead, just bump the minor version and drop the
patchlevel.
Except for the deleted linux-cell target, these were just the target
cc/cflags. The only usage was for gen_matypes, which wants the
target's structure packing, not the host, anyway.
Every place that uses ASM_FLAGS already uses DEFINES. Not including
it in DEFINES is just a way to screw up potential users, as I've done
several times while working on the build system.
Even pre-automake, we rely on gmake features for pattern
substitutions, and replacing those with reams more make code is not
interesting. This will let us turn the old Makefiles using pattern
substitutions into automake without spewing warnings.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
1) We need to insert a barrier between consecutive transform feedback calls.
2) VBO cache needs to be flushed when TFB output is used as VBO draw input.
Fixes Piglit test EXT_transform_feedback/immediate-reuse.
Thanks to Christoph Bumiller for pointing out bugs in previous versions
of this patch.
gl_ClipDistance needs special treatment in form of lowering pass
which transforms gl_ClipDistance representation from float[] to
vec4[]. There are 2 implementations - at glsl linker level (enabled
by LowerClipDistance option) and at glsl_to_tgsi level (enabled
unconditionally for gallium drivers). Second implementation is
incomplete - it does not take into account transform feedback (see
commit 642e5b413e "mesa: Fix transform
feedback of unsubscripted gl_ClipDistance array" for details).
There are 2 possible fixes:
- adding transform feedback support into glsl_to_tgsi version
- ripping gl_ClipDistance support from glsl_to_tgsi and enabling
gl_ClipDistance lowering on glsl linker side
This patch implements 2nd option. All it does is:
- reverts most of the commit 59be691638
"st/mesa: add support for gl_ClipDistance"
- changes LowerClipDistance to true
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{2,3,4,5,6,7,8}]-no-subscript" at least on nv50
and evergreen cards.
From the GL 3.0 spec (p.116):
"Multisample rasterization is enabled or disabled by calling
Enable or Disable with the symbolic constant MULTISAMPLE."
Elsewhere in the spec, where multisample rasterization is described
(sections 3.4.3, 3.5.4, and 3.6.6), the following text is consistently
used:
"If MULTISAMPLE is enabled, and the value of SAMPLE_BUFFERS is
one, then..."
So, in other words, disabling GL_MULTISAMPLE should prevent
multisample rasterization from occurring, even if the draw framebuffer
is multisampled. This patch implements that behaviour by setting the
WM and SF stage's "multisample rasterization mode" to
MSRAST_ON_PATTERN only when the draw framebuffer is multisampled *and*
GL_MULTISAMPLE is enabled.
Fixes piglit test spec/EXT_framebuffer_multisample/enable-flag.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware limitations, MSAA is unsupported on Gen6 for formats
containing >64 bits of data per pixel. From the Sandy Bridge PRM,
vol4 part1, p72 ("Surface Format"):
If Number of Multisamples is set to a value other than
MULTISAMPLECOUNT_1, this field cannot be set to the following
formats:
- any format with greater than 64 bits per element
- any compressed texture format (BC*)
- any YCRCB* format
Gen7 has a similar, but less stringent limitation: formats with >64
bits of data per pixel only support 4x MSAA.
This patch causes the unsupported formats to report
GL_FRAMEBUFFER_UNSUPPORTED.
Fixes piglit "multisample-formats" tests on Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Sandy Bridge and later don't use this field, so there's no point in
setting it. It can only cause harmful state-based recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The system array values concept doesn't really because it expects the
system values to be fixed per call, which is wrong for gl_VertexID and
iffy for gl_SampleID. So this patch does two things:
- kill the array, have emit_fetch_system_value directly pick the
values it needs (only gl_InstanceID for now, as the previous code)
- correctly handle the expected type in emit_fetch_system_value
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This includes:
- picking up correctly which attributes are flatshaded and which are
noperspective
- copying the flatshaded attributes when needed, including the
non-built-in ones
- correctly interpolating the noperspective attributes in screen-space
instead than in a 3d-correct fashion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
z or stencil texture should not be created with the z/stencil
flags for surface creation as they are intended to be bound
as texture.
v2: remove broken code
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Solaris Studio C compiler does not support anonymous structs and
anonymous unions.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The idea here is to rewrite comparisons like 2 >= x with x <= 2; we want
to simply exchange arguments, not negate the condition. If equality was
part of the original comparison, it should remain part of the swapped
version.
This is the true cause of bug #50298. It didn't manifest itself on
Sandybridge because we embed the conditional modifier in the IF
instruction rather than emitting a CMP. All other platforms use CMP.
It also didn't manifest itself on the master branch because commit
be5f27a84d ("glsl: Refine the loop instruction counting.") papered over
the problem.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50298
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes build error on Cygwin and Solaris. _R, _G, and _B are used in
ctype.h on those platforms.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a bug where a sampler view was using stale texture/resource
data when the texture was modified through a surface (render to texture).
Bumping the texture and layer ages triggers sampler view revalidation.
Fixes piglit fbo-blit failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This lets us select the front buffer for reading under GLES2.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extra condition checks the API not the version of the API, so rename
to reflect that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is failing sometimes, probably because TargetData keeps a structure layout
cache, which can becomes bogus, ever since the InvalidateStructLayoutInfo API
was removed in LLVM r135245.
This change merely makes the problem easier to diagnose (an assertion
failure instead of a random crash).
instead of failing to allocate a renderbuffer.
This also fixes piglit/get-renderbuffer-internalformat with non-renderable
formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows drivers not to do any allocation in AllocStorage if the storage
cannot be allocated because of an unsupported internalformat + samples combo.
The little ugliness is that AllocStorage is expected to return TRUE in this
case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires the latest streamout kernel patches.
Streamout is disabled by default on r7xx, so this patch is safe for regular
users.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Note: for the moment TGSI_OPCODE_F2U is implemented using
lp_build_itrunc() (the same function used to implement
TGSI_OPCODE_F2I). In the long run, we should create an
lp_build_utrunc() function to do the proper conversion. But this
should allow us to limp along with mostly correct behaviour for now.
Previously, we performed conversions from float->uint by a two step
process: float->int->uint. However, on platforms that use saturating
conversions (e.g. i965), this didn't work, because if the source value
was larger than the maximum representable int (0x7fffffff), then
converting it to an int would clamp it to 0x7fffffff.
This patch just adds the new opcode; further patches will adapt
optimization passes and back-ends to use it, and then finally the
ast_to_hir logic will be modified to emit the new opcode.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies blorp blits (which are used for MSAA) to properly
account for clipping of source coordinates. Previously, if we
detected the possibility of source clipping, we would fall back to the
blit meta-op, which doesn't support MSAA and is very slow for depth
and stencil buffers.
Fixes piglit tests
"EXT_framebuffer_multisample/clip-and-scissor-blit" on i965/Gen6+.
Also substantially speeds up the Humble Bundle V game "Psychonauts" on
Gen6+ (without this patch, the game's depth buffer blits use the slow
blit meta-op).
Reviewed-by: Carl Worth <cworth@cworth.org>
This allows to submit things to the compute only
rings on cayman+
v2: rebased on current master and actually make use
of the new flag in evergreen_compute.c
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When drawing a depth image the fragment shader also needs to emit the
current raster color.
The new piglit drawpix-z test exercises this.
NOTE: This is a candiate for the 8.0 branch.
This patch updates .gitignore files to account for the new build
artifacts introduced by the following commits:
ae376f0 glx/tests: Rename test as glx-test
8fecdcc mesa/tests: Add tests for _mesa_lookup_enum_by_{name,nr} functions
a29ad2b mesa/tests: Add tests for the generated dispatch table
Haiku targets the Pentium or higher processor.
To ensure compatibility we can do march 586 and
mtune 686. Mesa will still use sse however if
the cpu supports it (and the stack is properly
aligned). These flags only effect the internal
compiler optimizations.
Previously, rbug_*.c would fail to compile with incomplete prototype
errors when make was run from the command line on my machine. My IDE
always built fine, and still does after this patch (Netbeans 7.1.2).
Most of the includes from files in gallium/auxiliary/rbug/* were
assuming an rbug/ subdirectory, while the headers are actually in the
same directory as the .c files.
The build error was also previously a problem for me on Ubuntu 11.10
and Mint 12.
Fixes build for the following configuration: ./autogen.sh
--enable-debug --enable-texture-float --with-gallium-drivers=r600
--with-dri-drivers=radeon --enable-r600-llvm-compiler
Signed-off-by: Brian Paul <brianp@vmware.com>
In single precision, 1.5707963 becomes 1.5707962513 which is too
small. However, 1.5707964 becomes 1.5707963705 which is just right.
The value 1.5707964 is already used in asin.ir.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There is no GLX protocol for these functions. Open-source Linux
driver have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions. No open-source Linux
driver has ever supported this extension, and it seems unlikely at
this point that one ever will. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for this function. Open-source Linux driver
have not supported this extension for many years, and it seems
unlikely at this point that this support will return. There's no
reason to have slots for this function in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no GLX protocol for these functions, and no Linux driver has
ever supported this extension. There's no reason to have slots for
these functions in the dispatch table.
The unit tests (GetProcAddress::TableDidntShrink and others) are also updated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.1 and ARB_uniform_buffer_object. I only added
them to 3.1 because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These are from OpenGL 3.3, ARB_texture_swizzle, and
EXT_texture_swizzle (with different names). I only added them to 3.3
because that required the least work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Determines whether it's a basis vector, i.e., a vector with one element
equal to 1 and all other elements equal to 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When a value was replaced, the new key was strdup'd and leaked.
To fix this, we modify the hash table implementation to return
whether the value was replaced and free() the (now useless)
duplicate string.
When we have multiple shared contexts, and one of them is
long-running, this will lead to never freeing those resources
since they are shared. Instead, free them right away on context
destruction since we know the other context isn't using them.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 8.0 branch.
From the GL_NV_primitive_restart spec:
"PrimitiveRestartIndexNV is not compiled into display lists, but is
executed immediately."
Prior to this patch, calls to glPrimitiveRestartIndex would hit the noop
dispatch stub.
+2 oglconforms.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the GL_ARB_copy_buffer spec:
"An INVALID_VALUE error is generated if any of readoffset, writeoffset,
or size are negative [...]"
Fixes oglconform's copybuffer/negative.CNNegativeValues test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings appear to occur with newer automake (probably 1.12).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These allow one to mangle the library names, without also mangling the
symbol names, to make them distinct from other GL libraries on the
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Because these classes are used entirely from their own source files
and not from separate DSOs, the linker gets to produce massively less
code. This cuts about 13k of text in the libdricore case. In the
non-libdricore case, the additional linkage information allows the
compiler to inline some code, so libglsl.a size actually increases by
about 300 bytes.
For a dricore build, improves shader_runner runtime on
glsl-fs-copy-propagation-texcoords-1 by 0.21% +/- 0.03% (n=353574,
outliers removed). No statistically significant difference with n=322
on glslparsertest on a yofrankie shader intended to test compiler
performance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we have just one library of "all of Mesa core" instead of both
libdricore and libglsl that drivers link against.
I did this change in a sort of nonrecursive make fashion: the
generated files are still produced in the non-automake build, like the
rest of dricore, but the GLSL files are stuffed into libdricore
without building a convenience library in src/glsl (even though we
could now). This would make a bit more sense if glsl was just another
dir under src/mesa, because right now I had to contort the prefix
variable name to look another ../ level up.
This is part of a series to fix our build issues in the automake case
by hooking up the automatic Makefile regeneration support. The
extract_git_sha1 is moved into src/mesa/Makefile so that we get
correct dependency generation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tried to update all the old Makefiles that included the default
config to be sure they had a default target if they didn't previously
have one, since this new all target will always point at it. Almost
everything had one.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some more of the files are now autogenerated, this caused build breakage,
patch adds generation of these missing files. Patch also changes existing
make so that the files are created to be part of the local source
(not intermediate directory, this causes several problems).
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
This patch fixes a copy/paste error and masking of depth/stencil (stencil
is in the top 8 bits), and makes glean/readPixSanity happy.
Both the stencil and the depth buffer piglit test also pass if
glClear(DEPTH | STENCIL) is executed instead of
glClear(DEPTH)/glClear(STENCIL).
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Christopher Egert <cme3000@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
remove archaic .cvsignore
*.pyo is already in toplevel .gitignore
*.pyc is already in toplevel .gitignore
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, blits using the "blorp" mechanism only worked for 8-bit
RGBA color buffers, 24-bit depth buffers, and 8 bit stencil buffers.
This was not enough, because the blorp mechanism must be used for
blitting whenever MSAA is in use. This patch allows all formats to be
used, provided the source and destination formats match.
So far I have confirmed that the following formats work properly with
MSAA:
- GL_RGB
- GL_RGBA
- GL_ALPHA
- GL_ALPHA4
- GL_ALPHA8
- GL_R3_G3_B2
- GL_RGB4
- GL_RGB5
- GL_RGB8
- GL_RGB10
- GL_RGB12
- GL_RGB16
- GL_RGBA2
- GL_RGBA4
- GL_RGB5_A1
- GL_RGBA8
- GL_RGB10_A2
- GL_RGBA12
- GL_RGBA16
Fixes piglit tests "EXT_framebuffer_multisample/formats {2,4}" on
Sandy Bridge and Ivy Bridge.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the blorp engine only supported RGBA8 color buffers and
24-bit depth buffers. This patch adds support for any color buffer
format that is supported as a render target, and for 16-bit and 32-bit
depth buffers.
This required threading the brw_context struct through into
brw_blorp_surface_info::set() so that it can consult the
brw->render_target_format array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Even though brw_blorp_surface_info is derived from brw_blorp_mip_info,
this function doesn't need to be virtual, because it is never accessed
through a base class pointer. Making the function non-virtual will
allow it to take additional parameters in the brw_blorp_surface_info
case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch moves the responsibility for deciding on the format of the
source and destination surfaces from the
gen{6,7}_blorp_emit_surface_state() functions to
brw_blorp_surface_info::set(), which is shared between Gen6 and Gen7.
This will make it possible to add support for more surface formats
without code duplication.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGSI doesn't need an opcode, since registers are untyped (but beware
once doubles come into the scene). Mesa IR doesn't handle native
integers, so trying to handle them there is worthless, the case
entries are only added for warning reasons.
It was only tested with softpipe, since llvmpipe doesn't support glsl
1.3 yet.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
That adds support for activating the extension. It doesn't actually
*do* anything yet, of course.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the issues section of the GL_ARB_texture_compression_rgtc extension:
15) What should glGetTexLevelParameter return for
GL_TEXTURE_GREEN_SIZE and GL_TEXTURE_BLUE_SIZE for the RGTC1
formats? What should glGetTexLevelParameter return for
GL_TEXTURE_BLUE_SIZE for the RGTC2 formats?
RESOLVED: Zero bits.
These formats always return 0.0 for these respective components
and have no bits devoted to these components.
Returning 8 bits for red size of RGTC1 and the red and green
sizes of RGTC2 makes sense because that's the maximum potential
precision for the uncompressed texels.
Thus, we need to return 8 bits for GL_TEXTURE_RED_SIZE on all RGTC formats
and 8 bits for GL_TEXTURE_GREEN_SIZE on RGTC2 formats. BLUE should be 0.
Fixes oglconform/rgtc/advanced.texture_fetch.tex_param.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
While ~loop_state() is already freeing the loop_variable_state objects
via ralloc_free(this->mem_ctx), the ~loop_variable_state() destructor
was never getting called, so the hash table inside loop_variable_state
was never getting destroyed.
Fixes a memory leak in any shader with loops.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The functions for handling 1D, 2D and 3D texture images were nearly
identical. This folds them all together.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can't remove this pass yet, because we need it to convert AMDIL
registers in BRANCH* instructions, but we don't need it for
instruction conversion any more.
OpenGL allows you to declare user-defined fragment shader outputs with
less than four components:
out ivec2 color;
This makes sense if you're rendering to an RG format render target.
Previously, we assumed that all color outputs had four components (like
the built-in gl_FragColor/gl_FragData variables). This caused us to
call emit_color_write for invalid indices, incrementing the output
virtual GRF's reg_offset beyond the size of the register.
This caused cascading failures: split_virtual_grfs would allocate new
size-1 registers based on the virtual GRF size, but then proceed to
rewrite the out-of-bounds accesses assuming that it had allocated enough
new (contiguously numbered) registers. This resulted in instructions
that accessed size-1 GRFs which register numbers beyond
virtual_grf_next (i.e. registers that were never allocated).
Finally, this manifested as live variable analysis and instruction
scheduling accessing their temporary array with an out of bounds index
(as they're all sized based on virtual_grf_next), and the program would
segfault.
It looks like the hardware's Render Target Write message requires you to
send four components, even for RT formats such as RG or RGB. This patch
continues to use all four MRFs, but doesn't bother to fill any data for
the last few, which should be unused.
+2 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit 4650aea7a5 fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+18 piglits on Sandybridge.
NOTE: This and 4650aea7a5 are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit f41ecade7b fixed texelFetchOffset()
on Ivybridge, but didn't update the Ironlake/Sandybridge code.
+15 piglits on Sandybridge.
NOTE: This and f41ecade7b are both candidates for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't saved/restored by _mesa_meta_begin, so we need to do it
manually (like we do for the read/draw framebuffers). Additionally,
we neglected to re-bind before the glRenderbufferStorage call.
+13 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
DeleteBuffer needs to unbind from these binding points as well, based on
the same rationale as the previous patch.
+51 oglconforms (together with the last patch).
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_lookup_bufferobj returns NULL for 0, which caused us to say
"there's no such buffer object" and raise an error, rather than
correctly binding the shared NullBufferObj.
Now you can unbind your buffers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to the GL 3.1 spec, section 2.9 ("Buffer Objects"):
"If a buffer object is deleted while it is bound, all bindings to that
object in the current context (i.e. in the thread that called
DeleteBuffers) are reset to zero."
The code already checked for a number of cases, but neglected these
newer binding points.
+21 oglconforms.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We were incorrectly assuming that the coordinate's dimensionality is
equal to the gradient's dimensionality. For array types, the coordinate
has one more component.
Fixes 12 subcases of oglconform's glsl-bif-tex-grad test.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Currently, if you pass --with-egl-platforms=x11 but xcb-dri2 isn't available
we just silently fail and disables building the EGL DRI2 driver.
This commit cleans up the EGL platfrom checking and fails if a selected
platform can't find its required dependencies.
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit a07cf3397e added support for TBOs
on Gen7, but missed Gen6.
Passes piglit -t texture_buffer and oglconform's buffermapping
basic.read.texture tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to Table 6.17 in the GL 2.1 specification, DEPTH_TEXTURE_MODE,
TEXTURE_COMPARE_MODE, and TEXTURE_COMPARE_FUNC need to be restored on
glPopAttrib(GL_TEXTURE_BIT).
Makes a number of oglconform tests happier.
v2: Make restoration conditional on the ARB_shadow and ARB_depth_texture
extensions, as suggested by Brian. I'm not sure that any
implementations still remain that don't support those, but why not?
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
this fixes libdricore directory build with --enable-32-bit on a x86_64 system
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The VTX_READ instructions were using the ADDRParam ComplexPattern which
allows a load instruction's offset to be a register, but VTX_READ
instructions can only handle an immediate offset.
Also, the load_param pattern fragment had an erroneous return true;
statement that was causing it to match the wrong load instructions.
Tungsten Graphics has not existed for several years, and the majority of
ongoing development and support is done by Intel. I chose to include
"Open Source Technology Center" to distinguish it from, say, the closed
source Windows OpenGL driver.
The one downside to this patch is that applications that pattern match
against "Intel" may start applying workarounds meant for the Windows
driver. However, it does seem like the right thing to do.
This does change oglconform behavior.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
These look like debug messages from the switch-statement development.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tom Stellard:
- Updated for gallium interface changes
- Fixed a few bugs:
+ Set the loop counter
+ Calculate the correct number of pipes
- Added hooks into the LLVM compiler
v2:
-Separate IR type and LLVM triple
-Do the OpenCL C->LLVM IR and linking steps for all PIPE_SHADER_IR
types.
v3:
- Coding style fixes
- Removed compatibility code for LLVM < 3.1
- Split build_module_llvm() into three functions:
compile(), link(), and build_module_llvm()
v4:
- Use struct pipe_compute_program
v5:
- Don't malloc memory for struct pipe_llvm_program
v6:
- Fix serialization of llvm bytecode
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This structure is used as a header that precedes LLVM bytecode programs
that are passed to the drivers.
v2:
- s/pipe_compute_program/pipe_llvm_program/
v3:
- Rename to struct pipe_llvm_program_header
- Drop the char * prog member
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is for the llvm code that can't use extended initializers.
v2:
- Use const references for vector arguments
- Move constructor defs before data members
- Initialize all values in the default constructors
v3:
- Fix typo
A device now has two function for getting information about the IR
it needs to return.
ir_format() => returns the preferred IR
ir_target() => returns the triple for the target that is understood by
clang/llvm.
v2:
- renamed ir_target() to ir_format()
- renamed llvm_triple() to ir_target()
v3:
- Remove unnecessary include
- Do proper conversion from std::vector<char> to std::string
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2: Tom Stellard
- Update CAP description
v3: Tom Stellard
- TGSI targets should pass an empty string for this CAP.
v4: Tom Stellard
- TGSI targets can ignore this CAP.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
TEX instructions can't do saturation. Do the TEX into a temp reg w/out
saturation, then do a MOV_SAT.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Some distributions (like Arch Linux) make /usr/bin/python Python 3,
rather than Python 2. Since compare_ir uses /usr/bin/env python,
such systems will fail to run optimization-test, causing 'make check' to
always fail.
Automake's TESTS_ENVIRONMENT variable provides a mechanism to run
programs or set environment variables in the test environment.
Ideally, I think we would want to use AM_TESTS_ENVIRONMENT, since
TESTS_ENVIRONMENT is supposed to be user-overridable. However, it isn't
supported using the default/serial test runner.
Fixes 'make check' on Arch Linux and Gentoo.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
I started writing unit tests for a new piece of code, and discovered
they all failed due to a bug in ralloc. Clearly it needs a test suite.
v2: Rename to 'ralloc-test' and fix copyright date. (idr review)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If an object is allocated out of the NULL context, info->parent will be
NULL. Using the PTR_FROM_HEADER macro would be incorrect: it would say
that ralloc_parent(ralloc_context(NULL)) == sizeof(ralloc_header).
Fixes the new "null_parent" unit test.
NOTE: This is a candidate for the 7.9, 7.10, 7.11, and 8.0 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Discovered while running the Khronos conformance test suite and
receiving "implementation error: meta program compile failed."
This bug was recently introduced by the i965 clear patch set and would
only be detected while using the ES2 API and only on gen6+ hardware.
Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is performed in a subdirectory to avoid needing to convert all of
src/mesa/Makefile in one go.
I can now cherry-pick a commit containing glapi XML changes, do "(cd
src/mapi/glapi/gen && make) && make", and get a working driver.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In order to do the minimal change for libdricore conversion to
automake, I need to put its Makefile.am in a subdirectory. Automake
gets whiny/broken if you use GNU make features like "addprefix" or
"$(FILES:%=../%)" to munge your *_SOURCES. So, use a plain old
variable to be able to substitute in that "../"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
*_SOURCES is reserved for files lists for particular automake targets.
Also, "-" in the variable names is not allowed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This variable won't be set when called from non-automake makefiles,
but it cleans up shared-glapi's output.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa already always depends on python to build. The checked in
changes are not reviewed (because any trivial change rewrites the
world). We also have been pushing commits between xml change and
regen where at-build-time xml-generated code disagrees with committed
xml-generated code. And worst of all, sometimes we ("I") check in
*stale* xml-generated code.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
commit 87f12bb2d9 tried to fix rb->mt
being NULL, but change this case wrong.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kurt Roeckx <kurt@roeckx.be>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now model loading uses sgpr values with LLVM IR load instructions that
use the USER_SGPR address space.
The definition of the sgpr parameter to the use_sgpr() helper function
in radeonsi_shader.c has changed so that you can pass raw sgpr values
rather than having to divide the sgpr value you want to use by the dword
width of the type you want to load.
This function was causing compile errors in the tablegen'd code for
some intrinsic definitions. I don't think we really need this function,
so I'm removing the function body just as a temporary solution. I'll
look into removing the entire AMDILIntrinsicInfo class later.
v2: use a define for the maximum sample count
v3: also test odd sample counts (r300 supports MS3)
While multisample renderbuffers are supported by mesa, MS visuals
are not, so we need a way to tell dri/st not to advertise them even
if the gallium driver does support multisampled surfaces.
Otherwise applications selecting these non-functional visuals would
run into trouble ...
Reviewed-by: Brian Paul <brianp@vmware.com>
The code which scans the index buffer for restart indexes wasn't adding
the index buffer offset so we were always starting at offset=0. The
offset is usually zero so it wasn't noticed before.
Fixes a failure in the piglit primitive-restart test when testing
vertex data + index data in a single VBO.
NOTE: This is a candidate for the 8.0 branch.
Basic 4x MSAA support now works on Gen7. This patch enables it.
As with Gen6, MSAA support is still fairly preliminary. In
particular, the following are not yet supported:
- 8x oversampling (Gen7 has hardware support for this, but we do not
yet expose it).
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centrold interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, the blending necessary to blit an MSAA surface to a non-MSAA
surface could be accomplished with a single texturing operation. On
Gen7, the WM program must fetch each sample and blend them together
manually. From the Bspec (Shared Functions/Messages/Initiating
Message/Message Types/sample):
[DevIVB+]:Number of Multisamples on the associated surface must be
MULTISAMPLECOUNT_1.
This patch implements the manual blend operation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Since blorp uses color textures and render targets to do all its work
(even when blitting stencil and depth data), it always has to
configure the Gen7 GPU to use the new "sliced" MSAA layout. However,
when blitting stencil or depth data, the actual MSAA layout is
interleaved (as in Gen6). Therefore, blorp has to do extra coordinate
transformation work to account for the interleaving manually.
This patch causes blorp to perform the necessary extra coordinate
transformations.
It also modifies the blorp SURFACE_STATE setup code for Gen7, so that
it does not try to correct the surface width and height to account for
MSAA, since "sliced" MSAA layout doesn't affect the surface width or
height.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
When a Gen7 SURFACE_STATE is configured for MSAA, a number of
additional constaints come in to play. This patch adds a function
gen7_check_surface_setup() which verifies that all of those
constraints are met.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Starting in Gen7, there are two possible layouts for MSAA surfaces:
- Interleaved, in which additional samples are accommodated by scaling
up the width and height of the surface. This is the only layout
available in Gen6. On Gen7 it is used for depth and stencil
surfaces only.
- Sliced, in which the surface is stored as a 2D array, with array
slice n containing all pixel data for sample n. On Gen7 this layout
is used for color surfaces.
The "Sliced" layout has an additional requirement: it must be used in
ARYSPC_LOD0 mode, which means that the surface doesn't leave any extra
room between array slices for miplevels other than 0.
This patch modifies the surface allocation functions to use the
correct layout when allocating MSAA surfaces in Gen7, and to set the
array offsets properly when using ARYSPC_LOD0 mode. It also modifies
the code that populates SURFACE_STATE structures to ensure that
ARYSPC_LOD0 mode is selected in the appropriate circumstances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 support for blorp (blits using the render bath) now works for
non-MSAA purposes. This patch enables it.
Since blorp operations re-use the logic for HiZ ops, this required
adding a case to the switch statement in gen7_blorp_emit_wm_config(),
to allow for the case where no HiZ op is being performed.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Gen6, texel fetch is always accomplished using the SAMPLE_LD
message, which accepts arguments (u, v, r, lod, si). On Gen7, there
are two* texel fetch messages: SAMPLE_LD for non-MSAA surfaces, taking
arguments (u, lod, v), and SAMPLE_LD2DSS for MSAA surfaces, taking
arguments (si, u, v).
*Technically, there are other texel fetch messages, but they are used
for "compressed" MSAA surfaces, which we don't yet support.
This patch adds the proper message types and argument orderings for
Gen7.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen7 hardware requires us to enable at least one WM dispatch mode,
even if there is no program being dispatched to. When this code was
only used for HiZ operations (which don't use a WM program), we used
32-pixel dispatch, because it didn't matter. But blit programs are
compiled for 16-pixel dispatch. So just enable 16-wide dispatch
unconditionally.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Enable 16-wide dispatch unconditionally rather than add the
unnecessary complication of using 32-wide dispatch when there is no WM
program.
On Gen7, push constants for shader programs are stored in the URB, so
blorp code needs to set aside space for them. This was previously
unnecessary because blorp code was based on HiZ operations, which
don't require any shaders.
This patch adds a call from gen7_blorp_exec() to
gen7_allocate_push_constants(), to ensure that push constants are
assigned the correct location in the URB. It also extracts a new
function gen7_emit_urb_state() from gen7_upload_urb(), which is
re-used by gen7_blorp_emit_urb_config() to ensure that the URB regions
used by all the pipeline stages leave room for the push constants.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We know from previous bug fixes (commits
c25e5300cb and
b2ace06cbb) that texture border color
doesn't work if the dynamic state upper bound is set to 0. Although
the blorp engine doesn't make use of texture borders, it seems like we
ought to err on the safe side and set this value properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch separates out the portions of gen6_blorp_emit_batch_head()
that emit 3DSTATE_MULTISAMPLE, 3DSTATE_SAMPLE_MASK, and
STATE_BASE_ADDRESS. This paves the way for making the blorp code work
on Gen7, where additional command packets
(3DSTATE_PUSH_CONSTANT_ALLOC_VS and 3DSTATE_PUSH_CONSTANT_ALLOC_PS)
need to be emitted before 3DSTATE_MULTISAMPLE.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies the "blorp" WM program so that it can be run in
MSDISPMODE_PERSAMPLE (which means that every single sample of a
multisampled render target is dispatched to the WM program, not just
every pixel).
Previously we were using the ugly hack of configuring multisampled
destination surfaces as single-sampled, and generating sample indices
other than zero by swizzling the pixel coordinates in the WM program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch modifies the function brw_blorp_blit_program::texel_fetch()
to emit the SI (sample index) argument to the SAMPLE_LD message when
reading from a sample index other than zero.
Previously we were using the ugly hack of configuring multisampled
source surfaces as single-sampled, and accessing sample indices other
than zero by swizzling the texture coordinates in the WM program.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch generalizes the function
brw_blorp_blit_program::texture_lookup() so that it prepares the
arguments to the sampler message based on a caller-provided array
rather than assuming the argument order is always (u, v).
This paves the way for the messages we will need to use in Gen7, which
use argument orders (u, lod, v) and (si, u, v) (si=sample index).
It will also will allow us to read from arbitrary sample indices on
Gen6, by supplying the arguments (u, v, r, lod, si) to the SAMPLE_LD
message instead of just (u, v).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen6 MSAA buffers (and Gen7 MSAA depth/stencil buffers) interleave
MSAA samples in a complex pattern that repeats every 2x2 pixel block.
Therefore, when allocating an MSAA buffer, we need to make sure to
allocate an integer number of 2x2 blocks; if we don't, then some of
the samples in the last row and column will be cut off.
Fixes piglit tests "EXT_framebuffer_multisample/unaligned-blit {2,4}
color msaa" on i965/Gen6.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Without passing the -ldflags parameter before $(LDFLAGS) in some cases
flags will be passed to MKLIB which it does not understand.
This might be -m64, -m32 or similar.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Thomas Gstädtner <thomas@gstaedtner.net>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch gets the FreeBSD SCons build working again. The build still
fails though.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to return immediately after inserting instructions that require
S_WAITCNT so that the parent class' custom inserter won't try to insert
them again.
Fix uninitialized scalar variable defects report by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We should just set the bits of functionality that we support; the
GL/ES1/ES2 flags in extensions.c will take care of advertising the
appropriate extensions for the current API.
This enables the GL_EXT_texture_compression_dxt1 extension on ES1/ES2
when libtxc_dxtn is installed or the force_s3tc driconf option is set.
The main extension code set this up properly, but the ES-specific code
failed to do so.
Otherwise, the extension strings reported by es1_info, es2_info, and
glxinfo all remain the same.
This patch manually disables the ARB_framebuffer_object bit on ES
to preserve the behavior of 1c0f5d8324.
v2: Rebase, fix the i915 Makefile, and unconditionally set the
OES_draw_texture bit as core Mesa will only apply it to ES1 now.
Tested-by: Daniel Charles <daniel.charles@intel.com> [v1]
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> [v1]
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.
The VBO module's software handling of primitive restart is
used as a fall back.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When considering which components of a variable were killed by an
assignment, constant propagation would previously just use the write
mask of the assignment. This worked if the LHS of the assignment was
simple, e.g.:
v.xy = ...; // (assign (xy) (var_ref v) ...)
But it did the wrong thing if the LHS of the assignment involved an
array indexing operator, since in this case the write mask is always
(x):
v[i] = ...; // (assign (x) (deref_array (var_ref v) (var_ref i)) ...)
In general, we can't predict which vector component will be selected
by array indexing, so the only safe thing to do in this case is to
kill the entire variable.
Fixes piglit tests {fs,vs}-vector-indexing-kills-all-channels.shader_test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the linker handles initializers of samplers just like any
other uniform, a bunch of this annoying code is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The linker may have set initial values for uniforms. Propagate these
values to the driver's backing storage when it is first associated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Fix handling of arrays-of-structure. Thanks to Eric Anholt for
pointing this out.
v3: Minor comment change based on feedback from Ken.
Fixes piglit glsl-1.20/execution/uniform-initializer/fs-structure-array
and glsl-1.20/execution/uniform-initializer/vs-structure-array.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add support for gen6, and don't turn it on if blending is
disabled. (fixes GPU hang), and note it in docs/GL3.txt
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The i965 driver needed this as well for hardware setup, so instead of
duplicating the logic, just save it off.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
While it doesn't have the same warning in the simulator as in gen7,
let's emit it out of paranoia. We wouldn't want our resolves of some
previous clear to get clamped to some current clamping value.
Suggested-by: pretty much everyone
When doing fast clears, a fulsim warning said that the batch was being
emitted without the viewport set up. While the fast clear pass I was
looking at doesn't use the clear value, the later resolves which also
didn't set up the vieport would trigger the same. It's not obvious
from the error message whether it meant "fast clear value gets clamped
to something you haven't defined" or "fast clear value doesn't get
clamped, and I saw it was out of the current (uninitialized) range,
and you probably wanted it clamped to that (uninitialized) range". Be
paranoid and assume the first case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Having this enum separate caused us to need a bunch of helper
functions to translate to the op to be executed.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
The GLSL clear path doesn't need any buffer presence checks, since
those are already handled in the normal drawing path code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Our understanding is that the 3D engine is supposed to be faster
anyway. We used to have more overhead in our tri clear path than we
do today, which would have led to this choice. But given that we
almost always see a depth clear along with a color clear, the path was
hardly exercised anyway.
Also, the color mask logic was broken in the presence of
GL_EXT_draw_buffers2's per-buffer colormask.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previously, when the environment variable INTEL_DEBUG=aub was set,
mesa would simply instruct DRM to start dumping data to an .aub file,
but we would not provide DRM with any information about the format of
the data in various buffers. As a result, a lot of the data in the
generate .aub file would be unannotated, making further data analysis
difficult.
This patch causes the entire contents of each batch buffer to be
annotated using the data in brw->state_batch_list (which was
previously used only to annotate the output of INTEL_DEBUG=bat). This
includes data that was allocated by brw_state_batch, such as binding
tables, surface and sampler states, depth/stencil state, and so on.
The new annotation mechanism requires DRM version 2.4.34.
Reviewed-by: Eric Anholt <eric@anholt.net>
When we are generating an AUB dump, we make a final call to
aub_dump_bmp() as the context is being destroyed, to ensure that any
rendering performed before the application exits can be seen during a
simulation run. However, we were doing this before flushing the batch
buffer; as a result simulation runs would not always see the effect of
all rendering commands.
This patch flushes the batch buffer just before making the final call
to aub_dump_bmp(), to ensure that all rendering is properly captured
in the final bitmap.
This is a long standing problem, that recently surfaced with the change
to enable perspective correct color interpolation.
A fix for all possible formats is left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Previously assumed normalised was 0 to 1, but it can be -1 to 1
if type is signed.
Tested with lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixing a /*FIXME*/ to remove errors in integer conversion in lp_build_conv.
Tested using lp_test_conv and lp_test_format, reduced errors.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This patch removes two Clang warnings in GLU:
The first one seems to be an actual bug in mapdesc.cc: Clang complains
that sizeof(dest) will return the size of REAL*[MAXCOORDS], instead of
the intended REAL[MAXCOORDS][MAXCOORDS]. The second one is just
cosmetic because Clang doesn't like extra parentheses.
NOTE: This is a candidate for the 8.0 branch
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes another case of sampler views being created by one context,
shared by another, then deleted by the first, leaving a dangling
pipe context pointer.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Use it where performance matters more and the exact method of float->int
conversion/rounding isn't terribly important. There should no net change
here since F_TO_I() is the new name of the old IROUND() function.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The different implementations of IROUND() behaved differently and in
the case of fistp, depended on the current x86 FPU rounding mode.
This caused some tests like piglit roundmode-pixelstore and
roundmode-getintegerv to fail on 32-bit x86 but pass on 64-bit x86.
Now IROUND() always rounds to the nearest integer (away from zero).
The new F_TO_I function converts a float to an int by whatever means
is fastest. We'll use this where we're more concerned with performance
and not too worried to how the conversion is done.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The IROUND converted all arguments to 0 or 1. That's not what we wanted.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For zero-stride vertex arrays, the svga driver copies the value into
the constant value and uses that value in the shader. The recent
gallium-userbuf changes caused a regression in this. An example
symptom was per-primitive glColor3f() calls getting ignored.
Where we copied the vertex value from the vertex buffer to the
constant buffer we neglected to take into account the
pipe_vertex_buffer::buffer_offset field. Adding that value to the
source offset fixes the problem. Actually, it looks like we should
have been doing this all along, but it never was an issue before for
some reason.
If the MESA_GLSL env var contains "errors", GLSL compilation and
link errors will be reported to stderr.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fix uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Piglits test for fragment shaders pass, vertex shaders fail. The
actual failure seems to be in the interpolators, and not the
textureSize query.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: José Fonseca <jose.r.fonseca@gmail.com>
Fixes a bunch of piglit tests related to flat interpolation of floats.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
The VBO module now can handle primitive restart in software
if required. Therefore this support is no londer required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the PIPE_CAP_PRIMITIVE_RESTART screen param is not set, then enable
PrimitiveRestartInSoftware to enable software primitive restart
support in the VBO module.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When PrimitiveRestartInSoftware is set, the VBO module will handle
primitive restart scenarios before calling the vbo->draw_prims
drawing function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If set, then the VBO module will handle all primitive
restart scenarios before calling the driver draw_prims.
Software primitive restart support is disabled by default.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vbo_sw_primitive_restart implements primitive restart in software
by splitting primitive draws apart.
This is based on similar support in mesa/state_tracker/st_draw.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's an implied argument, and I don't think being explicit about it
helps.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment quotes spec saying that only scalar integers are allowed,
but we only checked for integer.
Fixes piglit switch-expression-const-ivec2.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Total instructions: 261582 -> 261316
135/2147 programs affected (6.3%)
36752 -> 36486 instructions in affected programs (0.7% reduction)
This excludes a tropics shader that now gets 16-wide mode and throws
off the numbers. 5 shaders are hurt: two extra MOVs in 4 tropics
shaders it looks like because we don't split register names according
to independent webs, and one gstreamer shader where it looks like
try_rewrite_rhs_to_dst() is falling on its face.
This should also help avoid a regression in VSes from idr's ARB
programs to GLSL work.
By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.
Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)
There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing. This
should be fixed by doing better at dead code elimination.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
No functional change. This patch replaces the
brw_blorp_params::exec() method with a global function
brw_blorp_exec() that performs the operation described by the params
data structure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch splits up the gen6_blorp_exec and gen7_blorp_exec
functions, which were very long, into simple component functions.
With a few exceptions, there is one function per state packet.
This will allow blit functionality to be added without significantly
complicating the code.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Rename the functions gen{6,7}_emit_wm_disable() to
gen{6,7}_emit_wm_config() (since the WM is not actually disabled
during HiZ ops; it simply doesn't have a program). Also, on gen7,
split out the configration of 3DSTATE_PS to a separate function
gen7_emit_ps_config().
This patch groups together the parameters used by the HiZ functions
into a new data structure, brw_hiz_resolve_params, rather than passing
each parameter individually between the HiZ functions. This data
structure is a subclass of brw_blorp_params, which represents the
parameters of a general-purpose blit or resolve operation. A future
patch will add another subclass for blits.
In addition, this patch generalizes the (width, height) parameters to
a full rect (x0, y0, x1, y1), since blitting operations will need to
be able to operate on arbitrary rectangles. Also, it renames several
of the HiZ functions to reflect the expanded role they will serve.
v2: Rename brw_hiz_resolve_params to brw_hiz_op_params. Move
gen{6,7}_blorp_exec() functions back into gen{6,7}_blorp.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When drawing to a FBO, the viewport wasn't always set correctly. It
was fine in the usual case of the viewport dims matching the surface
dims but broken otherwise. In particular, this was happening because
the viewport scale is negative for FBO rendering.
The piglit fbo-viewport test exercises this.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We can save one instruction by lowering it to:
SUB_INT tmp, 0, src
MAX_INT dst, src, tmp
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds .gitignore files to ignore the makefiles generated by
the gallium pipe loader and the clover OpenCL state tracker.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be convenient when I want to comment out optimization code
to see the raw program being optimized, but more importantly will let
the interference check be used during optimization.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).
shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When I had a bug causing the backend to never finish optimizing, it
also sent me deep into swap. This avoids extra memory allocation per
trip through optimization, and thus may reduce the peak memory
allocation of the driver even in the success case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)
This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%. It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.
v2: Add && !inst->conditional_mod to the "skip some instructions" check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets you omit some ampersands and is more idiomatic C++. Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows us to calculate the triangle's area using fixed point,
previously it was cacluated in floating point space. It was possible
that a triangle which had negative area in floating point space had
a positive area in fixed point space.
Fixes fdo 40920.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
As pointed out by Marek, if we have only one cb, we may as well add this
single register write here rather than adding it in the draw loop.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vertex and index buffers are never used by hardware, only by Draw.
SWTCL chipsets usually have very little memory, so this might help
with stability and reliability.
Instead of having to hack the code to enable these debugging options,
set them through the MESA_DEBUG env var.
Reviewed-by: Eric Anholt <eric@anholt.net>
This flag has been around for a while but it wasn't actually used anywhere.
Now, setting this flag causes a glFlush() to be issued after each
drawing call (including glBegin/End, glDrawElements, glDrawArrays,
glDrawPixels, glCopyPixels and glBitmap).
This was being done in the _mesa_Flush/Finish() calls but if there
was an internal call to _mesa_flush/finish() the FLUSH_VERTICES()
wouldn't happen. Looks like only the intel and radeon drivers made
such calls in MakeCurrent().
When glColorMaterial() is used to latch glColor commands to a material
attribute, glMaterial calls to change that material should become no-ops.
This failed to work properly when the glMaterial call was inside a
display list.
This removes the Material function from the vbo_attrib_tmp.h template
file. We have separate/different implementations for the "save" and
"exec" cases now.
NOTE: This is a candidate for the 8.0 branch.
_mesa_material_bitmask() will record a GL error and return 0 if
face or mode are illegal. Return early in that case.
NOTE: This is a candidate for the 8.0 branch.
Some code relies on the existing of an invalid texture target. It seems
safer to bring it back than to deal with unintended consequences.
This partially reverts commit a4ebb04214.
Reviewed-by: Brian Paul <brianp@vmware.com>
Contains the following patches squashed in:
commit 9fff1dc0875f7c9591550fa3ebbe1ba7a18483fa
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:20:03 2012 +0100
configure.ac: Build gallium loader when OpenCL is enabled
commit 542111cb02957418c6a285cb6ef2924e49adc66e
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:30:29 2012 +0100
configure.ac: Add sw/null to GALLIUM_WINSYS_DIRS for gallium loader
commit 876f8de46062dde76b6075be3b6628f969b16648
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Thu Feb 9 11:26:05 2012 -0500
configure.ac: Require gcc > 4.6.0 for clover
commit 99049d50fa3d9a23297ae658189c19c89dca1766
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:32:06 2012 +0100
configure.ac: Require Gallium drm loader when gallium loader is enabled
No longer silently exclude this when building OpenCL drivers
for nouveau and r600.
Add a test program that tries to exercise some of the language
features commonly used by compute programs at the Gallium API level:
- Correctness of the values returned by the grid parameters.
- Proper functioning of resource LOADs and STOREs.
- Subroutine calls.
- Argument passing to the compute parameter through the INPUT
memory space.
- Mapping of buffer objects to the GLOBAL memory space.
- Proper functioning of the PRIVATE and LOCAL memory spaces.
- Texture sampling and constant buffers.
- Support for multiple kernels in the same program.
- Indirect resource indexing.
- Formatted resource loads and stores (i.e. with channel conversion
and scaling) using several different formats.
- Proper functioning of work-group barriers.
- Atomicity and semantics of the atomic opcodes.
As of now all of them seem to pass on my nvA8.
It simplifies things slightly, and besides, it makes possible to
execute the trivial tests on a hardware device instead of being
limited to software rendering.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This target generates pipe driver modules intended to be consumed by
auxiliary/pipe-loader. Most of it was taken from the "gbm" target --
the duplicated code will be replaced with references to this target in
a future commit.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The goal is to have a uniform interface to create winsys and
pipe_screen instances for any driver, exposing the device enumeration
capabilities that might be supported by the operating system (for now
there's a "drm" back-end using udev and a "sw" back-end that always
returns the same built-in devices).
The typical use case of this library will be:
>
> struct pipe_loader_device devs[n];
> struct pipe_screen *screen;
>
> pipe_loader_probe(&devs, n);
>[pick some device from the array...]
>
> screen = pipe_loader_create_screen(dev, library_search_path);
>[do something with screen...]
>
> screen->destroy(screen);
> pipe_loader_release(&devs, N);
>
A part of the code was taken from targets/gbm/pipe_loader.c, which
will be removed and replaced with calls into this library by a future
commit.
Add a shader cap for specifying the preferred shader representation.
Right now the only supported value is TGSI, other enum values will be
added as they are needed.
This is mainly to accommodate AMD's LLVM compiler back-end by letting
it bypass the TGSI representation for compute programs. Other drivers
will keep using the common TGSI instruction set.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This change will be useful to implement function parameter passing on
top of TGSI. As we don't have a proper stack, a register-based
calling convention will be used instead, which isn't necessarily a bad
thing given that GPUs often have plenty of registers to spare.
Using the same register space for local temporaries and
inter-procedural communication caused some inefficiencies, because in
some cases the register allocator would lose the freedom to merge
temporary values together into the same physical register, leading to
suboptimal register (and sometimes, as a side effect, instruction)
usage.
The LOCAL declaration modifier specifies that the value isn't intended
for parameter passing and as a result the compiler doesn't have to
give any guarantees of it being preserved across function boundaries.
Ignoring the LOCAL flag doesn't change the semantics of a valid
program in any way, because local variables are just supposed to get a
more relaxed treatment. IOW, this should be a backwards-compatible
change.
Normal resource access (e.g. the LOAD TGSI opcode) is supposed to
perform a series of conversions to turn the texture data as it's found
in memory into the target data type.
In compute programs it's often the case that we only want to access
the raw bits as they're stored in some buffer object, and any kind of
channel conversion and scaling is harmful or inefficient, especially
in implementations that lack proper hardware support to take care of
it -- in those cases the conversion has to be implemented in software
and it's likely to result in a performance hit even if the pipe_buffer
and declaration data types are set up in a way that would just pass
the data through.
Add a declaration flag that marks a resource as typeless. No channel
conversion will be performed in that case, and the X coordinate of the
address vector will be interpreted in byte units instead of elements
for obvious reasons.
This is similar to D3D11's ByteAddressBuffer, and will be used to
implement OpenCL's constant arguments. The remaining four compute
memory spaces can also be understood as raw resources.
This texture type was already referred to by the documentation but it
was never defined. Define it as 0 to match the pipe_texture_target
enumeration values.
Move Interpolate, Centroid and CylindricalWrap from tgsi_declaration
to a separate token -- they only make sense for FS inputs and we need
room for other flags in the top-level declaration token.
This commit splits the current concept of resource into "sampler
views" and "shader resources":
"Sampler views" are textures or buffers that are bound to a given
shader stage and can be read from in conjunction with a sampler
object. They are analogous to OpenGL texture objects or Direct3D
SRVs.
"Shader resources" are textures or buffers that can be read and
written from a shader. There's no support for floating point
coordinates, address wrap modes or filtering, and, unlike sampler
views, shader resources are global for the whole graphics pipeline.
They are analogous to OpenGL image objects (as in
ARB_shader_image_load_store) or Direct3D UAVs.
Most hardware is likely to implement shader resources and sampler
views as separate objects, so, having the distinction at the API level
simplifies things slightly for the driver.
This patch introduces the SVIEW register file with a declaration token
and syntax analogous to the already existing RES register file. After
this change, the SAMPLE_* opcodes no longer accept a resource as
input, but rather a SVIEW object. To preserve the functionality of
reading from a sampler view with integer coordinates, the
SAMPLE_I(_MS) opcodes are introduced which are similar to LOAD(_MS)
but take a SVIEW register instead of a RES register as argument.
Define an interface that exposes the minimal functionality required to
implement some of the popular compute APIs. This commit adds entry
points to set the grid layout and other state required to keep track
of the usual address spaces employed in compute APIs, to bind a
compute program, and execute it on the device.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch renames the gen6_hiz.h and gen7_hiz.h files to correspond
to the renames of the corresponding .cpp files (see previous commit).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch converts the files gen6_hiz.c and gen7_hiz.c to C++, in
preparation for expanding the HiZ code to support arbitrary blits.
The new files are called gen6_blorp.cpp and gen7_blorp.cpp to reflect
the expanded role that this code will serve--"blorp" stands for "BLit
Or Resolve Pass".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previous to this patch, gen6_hiz.c contained two implicit type casts
from void * to a a non-void pointer type. This is allowed in C but
not in C++. This patch makes the type casts explicit, so that
gen6_hiz.c can be converted into a C++ file.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In C++, if a struct is defined inside another struct, or its name is
first seen inside a struct or function, the struct is nested inside
the namespace of the struct or function it appears in. In C, all
structs are visible from toplevel.
This patch explicitly moves the decalartions of intel_batchbuffer to
toplevel, so that it does not get nested inside a namespace when
header files are included from C++.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Fix wrong cube/3D texture layout for the tailing levels whose width or
height is smaller than the align unit.
From 965 B-spec http://intellinuxgraphics.org/VOL_1_graphics_core.pdf at
page 135:
All of the LOD=0 q-planes are stacked vertically, then below that,
the LOD=1 qplanes are stacked two-wide, then the LOD=2 qplanes are
stacked four-wide below that, and so on.
Thus we should always inrease pack_x_nr, which results to the pitch of LODn
may greater than the pitch of LOD0. So we should refactor mt->total_width
when needed.
This would fix the following webgl test case on all gen4 platforms:
conformance/textures/texture-size-cube-maps.html
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This points to the object with the function body, allowing us to map
from a built-in prototype to the actual body with IR code to execute.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- copy_masked_offset copies part of a constant into another,
assign-like.
- copy_offset copies a constant into (a subset of) another,
funcall-return like.
These methods are to be used to trace through assignments and function
calls when computing a constant expression.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
The method is used to get a reference to an ir_constant * within the
context of evaluating an assignment when calculating a
constant_expression_value.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
We were looping over all the vector components, but only dealing with
the first one. This was masked by the fact that constant expression
handling on built-ins went through custom code for the lessThan()
/function/ rather than the ir_binop_less expression operator.
NOTE: This is a candidate for all release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The vbo module recomputes its states if _NEW_ARRAY is set, so it shouldn't use
the same flag to notify the driver. Since we've run out of bits in NewState
and NewState is for core Mesa anyway, we need to find another way.
This patch is the first to start decoupling the state flags meant only
for core Mesa and those only for drivers.
The idea is to have two flag sets:
- gl_context::NewState - used by core Mesa only
- gl_context::NewDriverState - used by drivers only (the flags are defined
by the driver and opaque to core Mesa)
It makes perfect sense to use NewState|=_NEW_ARRAY to notify the vbo module
that the user changed vertex arrays, and the vbo module in turn sets
a driver-specific flag to notify the driver that it should update its vertex
array bindings.
The driver decides which bits of NewDriverState should be set and stores them
in gl_context::DriverFlags. Then, Core Mesa can do this:
ctx->NewDriverState |= ctx->DriverFlags.NewArray;
This patch implements this behavior and adapts st/mesa.
DriverFlags.NewArray is set to ST_NEW_VERTEX_ARRAYS.
Core Mesa only sets NewDriverState. It's the driver's responsibility to read
it whenever it wants and reset it to 0.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the future we'd like to treat vertex arrays as a state and
not as a parameter to the draw function. This is the first step
towards that goal. Part of the goal is to avoid array re-validation
for every draw call.
This commit adds:
const struct gl_client_array **gl_context::Array::_DrawArrays.
The pointer is changed in:
* vbo_draw_method
* vbo_rebase_prims - unused by gallium
* vbo_split_prims - unused by gallium
* st_RasterPos
Reviewed-by: Brian Paul <brianp@vmware.com>
Replacing "float equal to 1.0f" with "int not equal to 0".
This should help for further optimization of boolean computations.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We're using float as default type, so basically for every instruction that
wants other types for dst/src operands we need to perform the bitcast
to/from default float. Currently bitcast produces no-op MOV instruction,
will be eliminated later.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In i965 Gen7, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
In i965 Gen6, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
(Note that we have only ever observed this GPU hang on Gen6 when HiZ
is enabled, so another possible stopgap would be to disable HiZ).
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
When the user attaches a texture to one of the depth/stencil
attachment points (GL_STENCIL_ATTACHMENT or GL_DEPTH_ATTACHMENT), we
check to see if the same texture is also attached to the other
attachment point, and if so, we re-use the existing texture
attachment. This is necessary to ensure that if the user later
queries what is attached to GL_DEPTH_STENCIL_ATTACHMENT, they will not
receive an error.
If, however, the user attaches buffers to the two different attachment
points using different parameters (e.g. a different miplevel), then we
can't re-use the existing texture attachment, because it is pointing
to the wrong part of the texture. This might occur as a transitory
condition if, for example, if the user attached miplevel zero of a
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT, rendered to
it, and then later attempted to attach miplevel one of the same
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT.
This patch causes Mesa to check that GL_STENCIL_ATTACHMENT and
GL_DEPTH_ATTACHMENT use the same attachment parameters before
attempting to share the texture attachment.
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 depth_stencil_shared"
and "texturing/depthstencil-render-miplevels 1024
stencil_depth_shared".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to a miplevel other than 0 within a color, depth,
stencil, or HiZ buffer, we need to tell the GPU to render to an offset
within the buffer, so that the data is written into the correct
miplevel. We do this using a coarse offset (in pages), and a fine
adjustment (the so-called "tile_x" and "tile_y" values, which are
measured in pixels).
We have always computed the coarse offset and fine adjustment using
intel_renderbuffer_tile_offsets() function. This worked fine for
color and combined depth/stencil buffers, but failed to work properly
when HiZ and separate stencil were in use. It failed to work because
there is only one set of fine adjustment controls shared by the HiZ,
depth, and stencil buffers, so we need to choose tile_x and tile_y
values that are compatible with the tiling of all three buffers, and
then compute separate coarse offsets for each buffer.
This patch fixes the HiZ and separate stencil case by replacing the
call to intel_renderbuffer_tile_offsets() with calls to two functions:
intel_region_get_tile_masks(), which determines how much of the
adjustment can be performed using offsets and how much can be
performed using tile_x and tile_y, and
intel_region_get_aligned_offset(), which computes the coarse offset.
intel_region_get_tile_offsets() is still used for color renderbuffers,
so to avoid code duplication, I've re-worked it to use
intel_region_get_tile_masks() and intel_region_get_aligned_offset().
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 X" where X is one of
(depth, depth_and_stencil, depth_stencil_single_binding, depth_x,
depth_x_and_stencil, stencil, stencil_and_depth, stencil_and_depth_x).
On i965 Gen7, the variants of
"texturing/depthstencil-render-miplevels" that contain a stencil
buffer still fail, due to another problem: Gen7 seems to ignore the 3
LSB's of the tile_y adjustment (and possibly also tile_x).
v2: Removed spurious comments. Added assertions to check
preconditions of intel_region_get_aligned_offset().
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes ARB_framebuffer_object from the GLES1 and GLES2
extension lists in intel_extensions_es.c.
Fixes a crash in the Android browser on Ice Cream Sandwich.
The Android browser crashed because it did the following, which is legal
in GLES2 but not in ARB_framebuffer_object.
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// render render render...
glDeleteFramebuffers(1, &fb);
// go do other stuff...
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// This bind unexpectedly failed, and the app panics.
The semantics of glBindFramebuffer specified by ARB_framebuffer_object (a
desktop GL extension) and GLES2 specs are incompatible. The ideal solution
to fix this is to create separate API entry points for glBindFramebuffer,
one for GL and the other for GLES2. But, until that work is complete,
disabling ARB_framebuffer_object in GLES2 contexts safely fixes the problem.
Likewise, the semantics of glBindFramebuffer in ARB_framebuffer_object and
of glBindFramebufferOES in OES_framebuffer_object (a GLES1 extension) are
incompatible. Even though the functions have different names, the semantic
difference still results in a bug because both API calls are implemented
by a single function, _mesa_BindFramebufferEXT, which handles the semantic
difference incorrectly. Again, disabling ARB_framebuffer_object in GLES1
contexts safely fixes this problem.
According to the ARB_framebuffer_object spec, the extension is an
amalgamation of
EXT_framebuffer_object
EXT_framebuffer_blit
EXT_packed_depth_stencil
EXT_framebuffer_multisample
By disabling this extension, however, no functionality is removed from
GLES1 and GLES2 contexts because 1) the first three extensions are
explicitly enabled in Intel's ES extension lists and 2) no functionality
of the last extension is exposed in an ES context.
Note: This is a candidate for the 8.0 branch.
See-also: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg21006.html
CC: Charles Johnson <charles.f.johnson@intel.com>
CC: Sean Kelley <sean.v.kelley@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When primitive restart is enabled, and glArrayElement is called
with the restart index value, then call glPrimitiveRestartNV.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul<brianp@vmware.com>
The signal.h include was missed in the commit
bc16c73407 which leads to broken
compilations under Linux.
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
When doing the var->assigned change in
f2475ca424, I overzealously indented the
second block of code into the "if (var)" test. Revert these blocks to
the way they were before, just taking advantage of "var" to avoid
re-calling variable_referenced().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49066
I only considered var->assigned for FragColor and FragData, but
ignored when it was false for out vars. Fixes piglit
write-gl_FragColor-and-not-user-output.frag
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49068
It seems silly that GL lets you allocate these given that they're
framebuffer attachment incomplete, but the webgl conformance tests
actually go looking to see if the getters on 0-width/height
depth/stencil renderbuffers return good values. By failing out here,
they all got smashed to 0, which turned out to be correct for all the
getters they tested except for GL_RENDERBUFFER_INTERNAL_FORMAT. Now,
by succeeding but not making a miptree, that one also returns the
expected value.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The index is also used for GL_ARB_blend_func_extended. Cloning in
i965 was dropping a non-ARB_explicit_attrib_location index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When glTexImage or glCopyTexImage is called with internalFormat being a
generic compressed format (like GL_COMPRESSED_RGB) we need to do the same
error checks as for specific compressed formats. In particular, check if
the texture target is compatible with the format. None of the texture
compression formats we support so far work with GL_TEXTURE_1D, for example.
See also https://bugs.freedesktop.org/show_bug.cgi?id=49124
NOTE: This is a candidate for the 8.0 branch.
Otherwise it fails like so:
CC egl_dri2.lo
In file included from egl_dri2.h:40:0,
from egl_dri2.c:42:
../../../../../../src/egl/wayland/wayland-drm/wayland-drm.h:8:41:
fatal error: wayland-drm-server-protocol.h: No such file or directory
compilation terminated.
This new gbm entry point allows writing data into a gbm bo. The bo has
to be created with the GBM_BO_USE_WRITE flag, and it's only required to
work for GBM_BO_USE_CURSOR_64X64 bos.
The gbm API is designed to be the glue layer between EGL and KMS, but there
was never a mechanism initialize a buffer suitable for use with KMS
hw cursors. The hw cursor bo is typically not compatible with anything EGL
can render to, and thus there's no way to get data into such a bo.
gbm_bo_write() fills that gap while staying out of the efficient
cpu->gpu pixel transfer business.
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
At this point, in order for OpenCL to work correctly with r600g, OpenCL
specific intrinsics need to be defined in the LLVM tree. So, we need
to check for these intrinsics in the LLVM include directory to make sure
not to re-define them.
This is a pseudo instruction that enables the LLVM backend to encode
instructions and pass it through r600_bytecode_build()
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
This moves the alpha test control to derived state and disables alpha
testing for integer fbs.
fbo-blending test in piglit gets further when we do this (not a pass
but less fail).
v2: drop the fb_sx_alpha_test_control
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
Allows the creation of const aos masks which have the mask swizzled
to match the correct format.
Updated existing mask creation code to use the swizzled version where
necessary (tgsi register masks and llvmpipe aos blending).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With this feature enabled, the LLVM backend will dump the MachineIntrs
prior to emitting code. The mesa env variable R600_DUMP_SHADERS will enable
this feature in the backend.
Fix uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't delete MASK_WRITE instructions from the program, because this
will cause instructions being masked by MASK_WRITE to be marked dead and
then deleted in the dce pass.
Enabled MESA_FORMAT_RGBX8888_REV for RGBX. Android software
requires RGBX8888 format to be supported for software rendering.
That requires EGL to be capable of generating images from this
format.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add new format __DRI_IMAGE_FORMAT_XBGR8888 to __DRI_IMAGE.
HAL_PIXEL_FORMAT_RGBX_8888 now maps to __DRI_IMAGE_FORMAT_XBGR8888.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Only images created with intel_create_image() had the field properly
set. Set it also on intel_dup_image(), intel_create_image_from_name()
and intel_create_image_from_renderbuffer().
GL_ARB_texture_storage says:
The commands eglBindTexImage, wglBindTexImageARB, glXBindTexImageEXT or
EGLImageTargetTexture2DOES are not permitted on an immutable-format
texture.
They will generate the following errors:
- EGLImageTargetTexture2DOES: INVALID_OPERATION
- eglBindTexImage: EGL_BAD_MATCH
- wglBindTexImage: ERROR_INVALID_OPERATION
- glXBindTexImageEXT: BadMatch
Fixing the EGL and GLX cases requires extending the DRI interface,
since setTexBuffer2 doesn't currently return any error information.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adapted drivers: i915, llvmpipe, r300, r600, radeonsi, softpipe.
User index buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This reduces CPU overhead in st_draw_vbo and removes a lot of unnecessary code
in that function which was required only to comply with the gallium interface,
but wasn't any useful really.
Adapted drivers: i915, llvmpipe, r300, softpipe.
No changes required in: r600, radeonsi.
User vertex buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This is required for any serious constant buffer support.
Constant buffer offsets on ATI and NVIDIA DX10 and DX11 GPUs must be
a multiple of 256.
In OpenGL, this can be queried via GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT.
As noted in commit be4e46b21a,
this was missing before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A little analysis shows that the worst-case value for "nr" is 17:
- base_mrf = 2 ... 2
- header present (say gen == 5) ... 4
- aa_dest_stencil_reg (stencil test) ... 5
- SIMD16 mode: += 4 * reg_width ... 13
- source_depth_to_render_target ... 15
- dest_depth_reg ... 17
This resulted in us setting base_mrf to 2 and mlen to 15. In other
words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also,
the instruction scheduler data structures use arrays of size 16, so this
would cause us to access them out of bounds.
While the debugger system routine may need m0 and m1, we don't use it
today, so the simplest solution is just to move base_mrf back to 1.
That way, our worst case message fits in m1..m15, which is legal.
An alternative would be to fail on SIMD16 in this case, but that seems
a bit unfortunate if there's no real need to reserve m0 and m1.
Fixes new piglit test shaders/depth-test-and-write on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To ensure that the alloca is at the top of the function body, otherwise
LLVM will not eliminate them, causing stack misalignment on 32bits.
Reviewed-by: James Benton <jbenton@vmware.com>
This is taken from the ogl-math project, with Inverse renamed to adj
(since it's not actually the inverse), transposed, and our types
plugged in. There are potential CSE opportunities in this code
(particularly for hardware with RCP but not DIV), but we should be
doing CSE anyway, so don't hand-optimize.
Fixes piglit inverse tests.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This takes advantage of the builtin compiler to generate IR into a
string, the same way we read GLSL for function prototypes for our
profiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I keep getting lost in the Makefile trying to figure out what to edit
to work on builtin_compiler or glsl_compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
This couldn't be split because it would break bisecting.
Summary:
* r300g,r600g: stop using u_vbuf
* r300g,r600g: also report that the FIXED vertex type is unsupported
* u_vbuf: refactor for use in the state tracker
* cso: wire up u_vbuf with cso_context
* st/mesa: conditionally install u_vbuf
This adds the ability to initialize u_vbuf_caps before creating u_vbuf itself.
It will be useful for determining if u_vbuf should be used or not.
Also adapt r300g and r600g.
This fixes an assertion failure since:
commit 81afdd20f3
vbo: don't check twice whether it's valid to render
FLUSH_CURRENT may set _NEW_CURRENT_ATTRIB.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the source region for a glCopyPixels is completely outside the
source buffer bounds, no-op the copy. Fixes a failed assertion.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The LLVM backend can now be enabled for r600g by using the
--enable-r600-llvm-compiler configure flag. If you configure with this
flag, you can still use the default compiler by setting the envrionment
variable R600_USE_LLVM=0
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise HAVE_LLVM won't be included in the $(DEFINES) variable for
Automake generated Makefiles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This shaves 2k off the final dri.so, and removes lots of pointless
NULL, 0 passing.
most like pointless - but it looked nicer to me.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The __glapi_gentable_set_remaining_noop() routine treats the _glapi_struct
as an array of _glapi_get_dispatch_table_size() pointers, so we have to
allocate _glapi_get_dispatch_table_size()*sizeof(void*) bytes rather
than sizeof(struct _glapi_struct) bytes.
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Alexandre Demers sent me some cayman results with no major problems.
I'll rip out the env var in a week or so.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Full piglit run on my rv610 with no regressions.
This only leaves cayman, however my cayman is resisting my attempt
to get through a full piglit run.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on rv740 and confirmed no regressions.
We don't get GL3 on r700 due to transform feedback being busted still.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This cap is used by u_blitter to decide if it can use integers
in vertex data.
fixes some crashes with glsl130 in piglit
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on my SUMO machine and I see no regressions.
Lots of things to fix (skip->fail), but hey maybe we can fix them
if we can see them.
I'll try and work my way across r600,700,cayman sometime if nobody
else gets to them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The field wasn't actually used before and it's not used now either.
But this is a more logical place for it and will hopefully allow
doing smarter draw/array validation (per array object) in the future.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Our previous live interval analysis just said that anything in a loop
was live for the whole loop. If you had to spill a reg in a loop,
then we would consider the unspilled value live across the loop too,
so you never made progress by spilling. Eventually it would consider
everything in the loop unspillable and fail out.
With the new analysis, things completely deffed and used inside the
loop won't be marked live across the loop, so even if you
spill/unspill something that used to be live across the loop, you
reduce register pressure. But you usually don't even have to spill
any more, since our intervals are smaller than before.
This fixes assertion failure trying to compile the shader for the
"glyphy" text rasterier and piglit glsl-fs-unroll-explosion.
Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing
more shaders to be compiled in 16-wide mode.
This takes the fs_inst list generated by the visitor, and generates a
list of basic blocks with edges between them. This is a building
block for data-flow analysis.
We were checking for these at link time previously, which is not as
early as mandated, and would actually fail to detect conflicting
writes if dead code removal removed some writes.
Fixes failures in piglit
glsl-*/compiler/fragment-outputs/write-gl_Frag*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used for some compile-and-link-time error checking, where
currently we've been doing error checking only at link time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This runs optimization-test and produces the usual automake test
output, which may be interesting to automated build systems.
This doesn't convert the tests to be individually exposed to the
automake runner, because automake doesn't like wildcards (due to being
nonportable in make, not that we care).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the reason the declaration member existed in the reference
visitor, but I didn't copy the code from structure splitting that
avoided setting it.
This wasn't currently a problem, because we don't allow splitting of
in/out variables. But that would be nice to change some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was carried over from structure splitting, without thinking about
whether the name still made sense in this context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes encoding of VOP3 shader instructions.
The shift was wrong for source registers 2 and 3, and the resulting value was
only 32 bits, so the shift in SICodeEmitter::VOPPostEncode() didn't work as
intended.
If a non-default array object was bound at context destruction time
we'd try to unreference the array object after it was already deleted
in _mesa_free_varray_data(). Now do the unref first.
Fixes a regression from commit 86f53e6d6b.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's not nice when you have several variables pointing to the same array
and you wanna ask your editor "where is this used" and you only get an answer
for one of the four currval, legacy_currval, generic_currval, mat_currval,
which is quite useless, because you never see the whole picture.
Let's get rid of the additional pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It's already done in _mesa_validate_Draw* and it's not needed to do it again
unless I am missing something.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is a frequently-updated state and _NEW_ARRAY already causes revalidation
of the vbo module. It's kinda counter-productive to recompute arrays
in the vbo module if _NEW_ARRAY is set and then set _NEW_ARRAY again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This moves the RebindArrays flag into the vbo module, consolidates the code,
and adds missing vbo_draw_method calls.
Also with this change, the vertex arrays are not needlessly recalculated twice.
The issue with the old code was:
- If recalculate_input_bindings updates vp_varying_inputs, _NEW_ARRAY is set.
- _mesa_update_state is called and the vp_varying_inputs change causes
regeneration of the fixed-function shaders, which also sets _NEW_PROGRAM.
- The occurence of either _NEW_ARRAY or _NEW_PROGRAM sets
the recalculate_inputs flag to TRUE again.
- The new code sets the flag to FALSE after the second _mesa_update_state,
because there can't possibly be any change which would require recalculating
the arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Vinson reported that we failed to initialize this, which would lead to
all kinds of crashes if we actually used it. Since we don't use it,
we may as well just delete the broken code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change uses the array object factory for gl_array_objects. This
prevents crashes when deriving from gl_array_object.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
We don't normally clear immediately after drawing something. But as it
was, the drawing would incorrectly appear after the clear.
Fixes piglit clear-varray-2.0 failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Deletes a lot of pointless duplication, as well as some run-time effort.
Conveniently, GLSL 1.40 no longer needs a .vert variant, since it
doesn't define any built-ins specific to the vertex shader stage.
ARB_texture_rectangle and OES_EGL_image_external also only need a single
profile, since the .vert and .frag variants were identical.
I didn't bother with EXT_texture_array and OES_texture_3D because
they're so tiny that the savings would be miniscule.
Cuts the generated builtin_function.cpp from 1.7MB to 1.0MB (41%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The built-in subsystem uses "profiles," or GLSL shaders containing
prototypes for all built-ins supported within a particular language
version (or extension) and shader stage.
Since profiles were stage-specific, we had to cut and paste almost all
the prototypes between (e.g.) 110.vert and 110.frag. Naturally, this
led to sundry cut and paste bugs, where someone fixed an issue in .frag
but neglected to update .vert, or vice-versa. Geometry shaders would
have only made this worse.
This patch introduces support for a new '.glsl' profile suffix which
contains prototypes common to all shader stages. The existing '.frag'
and '.vert' profiles need only contain the few stage-specific built-ins.
Not only does this remove duplication, it makes built-in setup slightly
faster: we don't need to re-read the common prototypes and function
bodies for both the vertex and fragment shader stage.
Internally, this was trivial. We already create a list of gl_shader
objects to search through for built-ins: one for the core language
version/stage, and additional shaders for any extensions in use. This
patch simply adds another shader to the list: core/common, core/stage,
and extensions.
The next patch will update the profiles to remove the duplication.
It's separated out purely to make review easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Accelerates a few glReadPixels cases for WebGL.
See https://bugs.freedesktop.org/show_bug.cgi?id=48545
v2: Per Jose, use bit twiddling for the swizzle case instead of ubyte
arrays (it's about 44% faster).
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The GLSL 1.30 -> 4.10 specs all erroneously say "vec2" for a few
overloads of textureProjGradOffset, while most overloads and all other
texturing functions use ivec types.
The GLSL 4.20 specification corrects these to "ivec2", but doesn't
mention this as being a conscious change in behavior. Nor does the
ARB_shading_language_420pack extension. So presumably it was a typo.
At any rate, our builtin functions all use ivec already, so the fact
that these prototypes use plain vecs will only lead to applications
dying in a fire when trying to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 4ec449a6ed.
I meant to not push this one. Review found that a link error is not
mandated: it should link, but you get undefined rendering if you rely
on a missing stage.
page 42/55 section 2.11 "Vertex Shaders":
"If the program object has no vertex shader, or no program object
is currently in use, the results of vertex shader execution are
undefined."
(and similar for page 160/173 section 3.9 "Fragment Shaders" for FS,
and page 45/58 section 2.11.2 "Program Objects" for program being 0)
It turns out the commit was broken anyway, because it was missing a
"goto done", so linkstatus got smashed back to true later and the
error just showed up as a warning in the infolog.
All I know of that needs finishing in Mesa is to enable the extension
in a GL3.1 core context on i965 -- we're not going to expose it in
non-3.1 core contexts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the new piglit texelFetch() tests on these. Note that the rest
of the new functions are not tested (same as the non-2DRect versions
of most of them).
The non-integer versions were already reserved in 1.30, but apparently
these were forgotten.
Fixes piglit glsl-1.40/compiler/reserved/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents this error with Automake 1.9:
src/gallium/drivers/Makefile.am: C objects in subdir but
`AM_PROG_CC_C_O' not in `configure.ac'
autoreconf: automake failed with exit status: 1
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radeonsi and r600 have duplicate symbols, so it's not possible to
statically link both. Remove the newcomer, radeonsi, until duplicate
symbols are fixed.
Most things that work on Fermi should work on Kepler too.
There are a few performance optimizations left to do, like better
placement of texture barriers and adding scheduling data to the
shader instructions (without them, a thread group will be masked
for 32 cycles after each single instruction issue).
Edit: Don't do it for the main function of (graphics) shaders,
its inputs and outputs always go through TGSI_FILE_INPUT/OUTPUT.
This prevents all TEMPs from counting as live out and reduces
register pressure.
The point is to keep an independent dictionary for each function.
The array that was being used as dictionary has been converted into a
"bimap" for two different reasons: first, because having an almost
empty instance of an array with as many entries as registers there are
in the program, once for every function, would be wasteful, and
second, because we want to be able to map Value pointers back to
locations at some point.
The reason is that several passes (regalloc, function argument
binding, inlining) are going to require the callees of a function to
be processed before the caller.
Instruction attributes like WriteALUResult and ALUResultCompare
were being discarded during the some of the local transformations.
This fixes the following piglit tests:
glsl1-inequality (vec2, pass)
loopfunc
fs-any-bvec2-using-if
fs-op-ne-bvec2-bvec2-using-if
fs-op-ne-ivec2-ivec2-using-if
fs-op-ne-mat2-mat2-using-if
fs-op-ne-vec2-vec2-using-if
fs-op-ne-mat2x3-mat2x3-using-if
fs-op-ne-mat2x4-mat2x4-using-if
https://bugs.freedesktop.org/show_bug.cgi?id=45921
NOTE: This is a candidate for the stable branches.
The loop registers weren't being cleared, so any shader that was
executed after a shader containing loops was at risk of having a loop
randomly inserted into it.
This fixes over one hundred piglit tests, although these test
only failed during full piglit runs and would pass if
run individually. The exact number of piglit tests that this patch
fixes will vary depending on the version of piglit and the order the
tests are run.
NOTE: This is a candidate for the stable branches.
This lets us significantly shorten p->instructions->push_tail(ir), and
will be used in a few more places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we can fold a bunch of our expression setup in ff_fragment_shader
into single-line, parseable commits.
v2: Make it actually work. I wasn't setting num_components in the
mask structure, and not setting up a mask structure is way easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having to explicitly dereference is irritating and bloats the code,
when the compiler can detect and do the right thing.
v2: Use a little shim class to produce the automatic dereference
generation at compile time as opposed to runtime, while also
allowing compile-time type checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C++ constructors with placement new, while functional, are
extremely verbose, leading to generation of simple GLSL IR expressions
like (a * b + c * d) expanding to many lines of code and using lots of
temporary variables. By creating a new ir_builder.h that puts simple
generators in our namespace and taking advantage of ralloc_parent(),
we can generate much more compact code, at a minor runtime cost.
v2: Replace ir_instruction usage with just ir_rvalue.
v3: Drop remaining missed as_rvalue() in v2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The primary motivation for this rewrite was to have a maintainable driver
going forward, as nvfx was quite horrible in a lot of ways.
The driver is heavily based on the design of the nv50/nvc0 3d drivers we
already have, and uses the same common buffer/fence code. It also passes
a HEAP more piglit tests than nvfx did, supports a couple more features,
and a few more to come still probably.
The CPU footprint of this driver is far far less than nvfx, and translates
into far greater framerates in a lot of applications (unless you're using
a CPU that's way way newer than the GPUs of these generations....)
Basically, we once again have a maintained driver for these chipsets \o/
Feel free to report bugs now!
This driver hasn't been maintained properly for a very long time, and for
many very good reasons. It's horrible.
A new driver supporting these chipsets will appear with the commits that
port vieux/nv50/nvc0 to libdrm_nouveau-2.0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if
you use any other graph object in the meantime they'll forget their state
and spew a lovely METHOD_CNT error at you when you try to draw.
The pre-newlib driver has a flush_notify() hook which does this state
re-emit, and a number of random workarounds like extra flushes and state
dirtying after various operations to solve this issue.
I'm taking a slightly different approach to things instead, which has the
nice side-effect of removing the divergent code-paths for ttri/mtri, the
flush/dirty workarounds and the need for flush_notify. Also gives a few
FPS boost in OA, yay.
This is just a function to tell if a certain blend mode requires dual sources.
v2: move to inlines as per Brian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the blend mode mapping, it also uses the var->index in the
glsl to tgsi convertor - this is the other half of my using 4 in the GLSL
compiler.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds index support to the GLSL compiler.
I'm not 100% sure of my approach here, esp without how output ordering
happens wrt location, index pairs, in the "mark" function.
Since current hw doesn't ever have a location > 0 with an index > 0,
we don't have to work out if the output ordering the hw requires is
location, index, location, index or location, location, index, index.
But we have no hw to know, so punt on it for now.
v2: index requires layout - catch and error
setup explicit index properly.
v3: drop idx_offset stuff, assume index follow location
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add implementations of the two API functions,
Add a new strings to uint mapping for index bindings
Add the blending mode validation for SRC1 + SRC_ALPHA_SATURATE
Add get for MAX_DUAL_SOURCE_DRAW_BUFFERS
v2:
Add check in valid_to_render to address case in spec ERRORS.
v3:
Add index to ir.h so this patch compiles on its own
fixup comment
v4: fixup Brian's comments
The GLSL patch will setup the indices.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit adds initial support for acceleration
on SI chips. egltri is starting to work.
The SI/R600 llvm backend is currently included in mesa
but that may change in the future.
The plan is to write a single gallium driver and
use gallium to support X acceleration.
This commit contains patches from:
Tom Stellard <thomas.stellard@amd.com>
Michel Dänzer <michel.daenzer@amd.com>
Alex Deucher <alexander.deucher@amd.com>
Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following commits were squashed in:
======================================================================
radeonsi: Remove unused winsys pointer
This was removed from r600g in commit:
commit 96d882939d
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Feb 17 01:49:49 2012 +0100
gallium: remove unused winsys pointers in pipe_screen and pipe_context
A winsys is already a private object of a driver.
======================================================================
radeonsi: Copy color clamping CAPs from r600
Not sure if the values of these CAPS are correct for radeonsi, but the
same changed were made to r600g in commit:
commit bc1c836938
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Jan 23 03:11:17 2012 +0100
st/mesa: do vertex and fragment color clamping in shaders
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
======================================================================
radeonsi: Remove PIPE_CAP_OUTPUT_READ
This CAP was dropped in commit:
commit 04e3240087
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Feb 23 23:44:36 2012 +0100
gallium: remove PIPE_SHADER_CAP_OUTPUT_READ
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
======================================================================
radeonsi: Add missing parameters to rws->buffer_get_tiling() call
This was changed in commit:
commit c0c979eebc
Author: Jerome Glisse <jglisse@redhat.com>
Date: Mon Jan 30 17:22:13 2012 -0500
r600g: add support for common surface allocator for tiling v13
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
======================================================================
radeonsi: Remove PIPE_TRANSFER_MAP_PERMANENTLY
This was removed in commit:
commit 62f44f670b
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Mar 5 13:45:00 2012 +0100
Revert "gallium: add flag PIPE_TRANSFER_MAP_PERMANENTLY"
This reverts commit 0950086376.
It was decided to refactor the transfer API instead of adding workarounds
to address the performance issues.
======================================================================
radeonsi: Handle PIPE_VIDEO_CAP_PREFERED_FORMAT.
Reintroduced in commit 9d9afcb5ba.
======================================================================
radeonsi: nuke the fallback for vertex and fragment color clamping
Ported from r600g commit c2b800cf38.
======================================================================
radeonsi: don't expose transform_feedback2 without kernel support
Ported from r600g commit 15146fd1bc.
======================================================================
radeonsi: Handle PIPE_CAP_GLSL_FEATURE_LEVEL.
Ported from r600g part of commit 171be75522.
======================================================================
radeonsi: set minimum point size to 1.0 for non-sprite non-aa points.
Ported from r600g commit f183cc9ce3.
======================================================================
radeonsi: rework and consolidate stencilref state setting.
Ported from r600g commit a2361946e7.
======================================================================
radeonsi: cleanup setting DB_SHADER_CONTROL.
Ported from r600g commit 3d061caaed.
======================================================================
radeonsi: Get rid of register masks.
Ported from r600g commits
3d061caaed13b646ff40754f8ebe73f3d4983c5b..9344ab382a1765c1a7c2560e771485edf4954fe2.
======================================================================
radeonsi: get rid of r600_context_reg.
Ported from r600g commits
9344ab382a1765c1a7c2560e771485edf4954fe2..bed20f02a771f43e1c5092254705701c228cfa7f.
======================================================================
radeonsi: Fix regression from 'Get rid of register masks'.
======================================================================
radeonsi: optimize r600_resource_va.
Ported from r600g commit 669d8766ff.
======================================================================
radeonsi: remove u8,u16,u32,u64 types.
Ported from r600g commit 78293b99b2.
======================================================================
radeonsi: merge r600_context with r600_pipe_context.
Ported from r600g commit e4340c1908.
======================================================================
radeonsi: Miscellaneous context cleanups.
Ported from r600g commits
e4340c1908a6a3b09e1a15d5195f6da7d00494d0..621e0db71c5ddcb379171064a4f720c9cf01e888.
======================================================================
radeonsi: add a new simple API for state emission.
Ported from r600g commits
621e0db71c5ddcb379171064a4f720c9cf01e888..f661405637bba32c2cfbeecf6e2e56e414e9521e.
======================================================================
radeonsi: Also remove sbu_flags member of struct r600_reg.
Requires using sid.h instead of r600d.h for the new CP_COHER_CNTL definitions,
so some code needs to be disabled for now.
======================================================================
radeonsi: Miscellaneous simplifications.
Ported from r600g commits 38bf276348 and
b0337b679a.
======================================================================
radeonsi: Handle PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION.
Ported from commit 8b4f7b0672.
======================================================================
radeonsi: Use a fake reloc to sleep for fences.
Ported from r600g commit 8cd03b933c.
======================================================================
radeonsi: adapt to get_query_result interface change.
Ported from r600g commit 4445e170be.
clang warns on these:
stroker.c:626:19: warning: implicit conversion from enumeration
type 'VGPathCommand' to different enumeration type 'VGPathSegment'
[-Wconversion]
No change in the underlying value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
brw_wm_surface_state.c:330:30: warning: initializer overrides prior
initialization of this subobject [-Winitializer-overrides]
[MESA_FORMAT_Z24_S8] = 0,
^
brw_wm_surface_state.c:326:30: note: previous initialization is here
[MESA_FORMAT_Z24_S8] = 0,
^
No functionality change, since the array is declared static so
it was zero-initialized by default.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Silences a clang warning:
format_pack.c:2546:30: warning: implicit conversion from 'int' to
'GLubyte' (aka 'unsigned char') changes value from 65535 to 255
[-Wconstant-conversion]
d[i] = d[i] ? 0xffff : 0x0;
~ ^~~~~~
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
egl_st.c:57:50: warning: field precision should have type 'int',
but argument has type 'size_t' (aka 'unsigned long') [-Wformat]
ret = util_snprintf(path, sizeof(path), "%.*s/%s" UTIL_DL_EXT,
~~^~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
C still treats array arguments exactly like pointer arguments.
By sheer coincidence, this still worked fine on 64-bit
machines where 2 * sizeof(float) == sizeof(void*), but not
on 32-bit.
Noticed by clang:
text.c:76:51: warning: sizeof on array function parameter will
return size of 'const VGfloat *' (aka 'const float *') instead of
'const VGfloat [2]' [-Wsizeof-array-argument]
memcpy(glyph->glyph_origin, glyphOrigin, sizeof(glyphOrigin));
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
eglimage.c:48:28: warning: argument to 'sizeof' in 'memset' call is
the same expression as the destination; did you mean to dereference
it? [-Wsizeof-pointer-memaccess]
memset(attrs, 0, sizeof(attrs));
~~~~~ ^~~~~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Most of the 256 values in the 'generic_to_slot' table were supposed to
be initialized with the default value 0xff, but were left at zero
(from CALLOC_STRUCT()) instead.
Noticed by clang:
u_linkage.h:60:31: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination;
did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess]
memset(table, 0xff, sizeof(table));
~~~~~ ^~~~~
Also fix a signed/unsigned comparison and a comment typo here.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
container_of() can legally return anything, even invalid addresses
that cause segfaults, when 'sample' is an uninitialized pointer.
Bug exposed by clang.
NOTE: This is a candidate for the 8.0 branch.
Fix uninitialized scalar field defect reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 272bc48976 removed the damage implementation for the
wl_buffer_interface because that has been removed from git master of
Wayland. However this breaks building with the 0.85 branch of Wayland
because it would end up initialising the struct incorrectly.
For the time being it's quite convenient for some compositors to track
the 0.85 branch of Wayland because the protocol is stable but they
will also want to track the master branch of Mesa so that they can use
the gbm surface changes.
This patch adds a compile-time check for the version of Wayland so
that it can work with either Wayland master or the 0.85 branch.
krh: Edited to also account for API changes in 6802eaa68, which
removes the timestamp argument from wl_resource_destroy().
The upstream of gtest has decided that the intended usage model is for
projects to import the source and use it, which is reflected in their
recent removal of the gtest-config tool.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
By making a bool fs_reg only have a defined low bit (matching CMP
output), instead of being a full 0 or 1 value, we reduce the ANDs
generated in logic chains like:
if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth ||
v_texcoord.y < 0.0 || v_texcoord.y > 1.0)
discard;
My concern originally when writing this code was that we would end up
generating unnecessary ANDs on bool uniforms, so I put the ANDs right
at the point of doing the CMPs that otherwise set only the low bit.
However, in order to use a bool, we're generating some instruction
anyway (e.g. moving it so as to produce a condition code update), and
those instructions can often be turned into an AND at that point. It
turns out in the shaders I have on hand, none of them regress in
instruction count:
Total instructions: 262649 -> 262545
39/2148 programs affected (1.8%)
14253 -> 14149 instructions in affected programs (0.7% reduction)
This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.
Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)
v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
This should fit in well with our lower_mat_op_to_vec code: now, in
addition to having expressions on each column of a matrix, we also
split the columns to separate variables so they can be tracked
individually by the copy propagation, dead code, and other passes.
This optimizes out some more code generation in unigine and gstreamer
shaders.
Total instructions: 269342 -> 269270
14/2148 programs affected (0.7%)
2226 -> 2154 instructions in affected programs (3.2% reduction)
I've had this code laying around almost done for a long time. The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences. While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.
While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
The Android build was broken by
commit ca760181b4
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Fri Mar 16 12:55:40 2012 -0400
shared-glapi: Convert to automake
The offending change was that it redefined the filepaths in sources.mak
like this:
- FOO_FILES := bar.c
+ FOO_FILES := $(TOP)/src/mapi/mapi/bar.c
This broke the build because source filepaths in Android makefiles must be
relative to the makefile.
Ideally, this could be fixed by reverting the change in sources.mak and
making shared-glapi's Makefile.am use $(addprefix $(TOP)/src/mapi/mapi,
$(FOO_FILES)). However, automake doesn't understand builtin GNU make
functions, such as addprefix. So, it seems that automake and Android can
no longer share sources.mak.
Fix the build by duplicating the source lists from sources.mak into
Android.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Keep a reference to any newly allocated aux buffers to avoid
re-allocating for every st_framebuffer_validate() (i.e. leaking).
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.
Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height. However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.
Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values. To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Clarify comments about the DDX.
This is a related fix for the Wayland change:
commit 83685c506e76212ae4e5cb722205d98d3b0603b9
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Mon Mar 26 16:33:24 2012 -0400
Remove wl_buffer.damage and simplify shm implementation
Apparently, this should also fix a memory leak. When wl_buffer.damage
was removed from Wayland and Mesa was not fixed, wl_buffer.destroy ended
up in the (empty) damage function instead of calling
wl_resource_destroy().
Spotted during build as:
CC wayland-drm-protocol.lo
wayland-drm.c:80:2: warning: initialization from incompatible pointer type
wayland-drm.c:82:1: warning: excess elements in struct initializer
wayland-drm.c:82:1: warning: (near initialization for 'drm_buffer_interface')
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Fixes uninitialized member defects reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was hacked in in one place for EGL image stuff, but the right
thing to do was just to provide the mapping from the mesa format to
the native hardware format, which includes render target support.
This turns out to be required for GL_ARB_texture_buffer_object, which
sees data in this layout.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out this field *is* used, and it's the stride between samples
from the buffer. Discovered during TBO debugging.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a function full of unused mappings from the GLenum to
datatype/comps, but that wasn't all the information a driver would
want, which includes the other fields that a gl_format has. Given
that all the texture buffer formats were represented in gl_format,
just use that as our description.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have to skip some work that wants to look at texture images, since
buffer textures don't have any of that complexity.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that should be needed is that it exists. Fixes segfaults on first
_mesa_update_context() with a samplerBuffer-using shader active but
without a particular buffer texture enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix texelFetch(sampler2DRect) and textureSize(samplerBuffer)
generation to not reference a LOD at the same time because it's easier
than not fixing it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplerBuffer type will be undefined in !glsl 1.40, and the
keyword is marked as reserved. The [iu]samplerBuffer types are not
marked as reserved pre-1.40, so they don't have separate tokens and
fall through to normal type handling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're supposed to just immediately call it. Fixes piglit
GL_ARB_texture_buffer_object/dlist
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is set correctly in gl.spec, but was missed in Mesa. As a
result, only one of the two was hooked up in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have lexer recognition of a bunch of our types based on the
handling. This code was mapping those recognized tokens to an enum
and then to a string of their name. Just drop the enums and provide
the string directly in the parser.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nothing actually relied on them being mutable, and there was at least
one cast which discarded const qualifiers. The next patch would have
introduced many more.
Casting away const qualifiers should be avoided if at all possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In "release" builds, Mesa would print this message if the MESA_DEBUG
variable was set. Make it so for debug builds as well.
I build debug builds all the time, but I'm not debugging this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While ir_to_mesa contains code that attempts to support functions, I
honestly doubt it's been tested and have little confidence that it
works.
The comment in visit(ir_function *ir) doesn't inspire confidence:
/* Ignore function bodies other than main() -- we shouldn't see calls to
* them since they should all be inlined before we get to ir_to_mesa.
*/
Furthermore, hardware drivers such as i915, i965, and (AFAICT) r200
don't support the BGNSUB/ENDSUB/CAL opcodes anyway. Only swrast does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When SPRITE_POINT_ENABLE bit is set, the texture coord would be
replaced, and this is only needed when we called something like
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE).
And more, we currently handle varying inputs as texture coord,
we would be careful when setting this bit and set it just when
needed, or you will find the value of varying input is not right
and changed.
Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex
coord units need do CoordReplace. Or fallback is needed to make
sure the rendering is right.
With handling the bit setup at i915_update_sprite_point_enable(),
we don't need the relative code at i915Enable then.
This patch would _really_ fix the webglc point-size.html test case and
of course, not regress piglit point-sprite and glean-pointSprite
testcase.
NOTE: This is a candidate for stable release branches.
v2: fallback just when all enabled tex coord units need do
CoordReplace (Eric)
v3: move the sprite point validate code at I915InvalidateState (Eric)
v4: sprite point enable bit update based on _NEW_PROGRAM, too
add relative _NEW-state comments to show what state is being used(Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix 'set but not used' warnings; gl_version, gl_versions_profiles and
glx_extensions variables are used just only HAVE_XCB_GLX_CREATE_CONTEXT
is defined. Thus those warnings are shown when that macro isn't defined.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fixes clang error:
tgsi/tgsi_dump.c:72:12: error: no member named '__printf_chk' in 'struct dump_ctx'
ctx->printf( ctx, "%u", e );
~~~ ^
/usr/include/bits/stdio2.h:109:3: note: expanded from macro 'printf'
__printf_chk (__USE_FORTIFY_LEVEL - 1, __VA_ARGS__)
^
Idea stolen from:
http://www.mail-archive.com/pld-cvs-commit@lists.pld-linux.org/msg210998.html
Reviewed-by: Brian Paul <brianp@vmware.com>
Add the maximum base vertex offset to max_index for computing the
buffer size. Fixes a failed assertion in the u_upload_mgr.c code with
the VMware svga driver.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=48141
v2: incorporate Marek's suggestions.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Return 0 for features we don't support. Added debug_printf()
warnings when we fail to handle a new PIPE_CAP_x case. That will
alert us to interfaces changes in the future. We don't want to
just ignore new PIPE_CAPs and possibly miss something important.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we weren't clamping the vertex colors produced by ARB vertex
programs. This could result in some rendering being too bright (in
ETQW, for example).
Also add cases for PIPE_CAP_VERTEX_COLOR_CLAMPED and
PIPE_CAP_FRAGMENT_COLOR_CLAMPED with comments to be complete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We already program all the sampler state correctly, we just didn't give
the GPU a pointer to it for the VS stage. Thus, any texturing other
than texelFetch() wouldn't work.
Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's
glsl-bif-tex subtests.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested with lp_test_arit with 100% passes and piglit tests with 100%
pass for log but some tests still fail for pow.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This reduces a little of CPU overhead.
The idea is to translate pipe vertex buffers directly into the CS
and not using any intermediate representations.
Framerate in Torcs:
before: 32.2
after: 34.6
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
st/mesa doesn't allow src_offset to be greater than stride and the maximum
stride r600 supports is 2047.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Limits maximum loop iterations in a TGSI shader to prevent infinite
loops from occurring, any iteration in any loop counts towards this
limit
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes Coverity resource leak defects.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Variables have types, expression trees have types, but statements don't.
Rather than have a nonsensical field that stays NULL in the base class,
just move it to where it makes sense.
Fix up a few places that lazily used ir_instruction even though they
actually knew the particular subclass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, set_callee() performed some assertions about the type of the
ir_call; protecting the bare pointer ensured these checks would be run.
However, ir_call no longer has a type, so the getter and setter methods
don't actually do anything useful. Remove them in favor of accessing
callee directly, as is done with most other fields in our IR.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aside from ir_call, our IR is cleanly split into two classes:
- Statements (typeless; used for side effects, control flow)
- Values (deeply nestable, pure, typed expression trees)
Unfortunately, ir_call confused all this:
- For void functions, we placed ir_call directly in the instruction
stream, treating it as an untyped statement. Yet, it was a subclass
of ir_rvalue, and no other ir_rvalue could be used in this way.
- For functions with a return value, ir_call could be placed in
arbitrary expression trees. While this fit naturally with the source
language, it meant that expressions might not be pure, making it
difficult to transform and optimize them. To combat this, we always
emitted ir_call directly in the RHS of an ir_assignment, only using
a temporary variable in expression trees. Many passes relied on this
assumption; the acos and atan built-ins violated it.
This patch makes ir_call a statement (ir_instruction) rather than a
value (ir_rvalue). Non-void calls now take a ir_dereference of a
variable, and store the return value there---effectively a call and
assignment rolled into one. They cannot be embedded in expressions.
All expression trees are now pure, without exception.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the time, we just want to read an ir_dereference, so there's no
need to have these in separate functions. However, the next patch will
want to read an ir_dereference_variable directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When translating a call from AST to HIR, we need to decide whether it
can be evaluated to a constant before emitting any code (namely, the
temporary declaration, assignment, and call.)
Soon, ir_call will become a statement taking a dereference of where to
store the return value, rather than an rvalue to be used on the RHS of
an assignment. It will be more convenient to try evaluation before
creating a call. ir_function_signature seems like a reasonable place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, ir_call can be used as either a statement (for void
functions) or a value (for non-void functions). This is rather awkward,
as it's the only class that can be used in both forms.
A number of places use ir_call::get_error_instruction() to construct a
generic value of error_type. If ir_call is to become a statement, it
can no longer serve this purpose.
Unfortunately, none of our classes are particularly well suited for
this, and creating a new one would be rather aggrandizing. So, this
patch introduces ir_rvalue::error_value(), a static method that creates
an instance of the base class, ir_rvalue. This has the nice property
that you can't accidentally try and access uninitialized fields (as it
doesn't have any). The downside is that the base class is no longer
abstract.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
generate_call() and ast_function_expression::hir() both tried to verify
that 'out' and 'inout' parameters used l-values. Irritatingly, it
turned out that this was not redundant; both checks caught -some- cases.
This patch combines the two into a single "complete" function that does
all the parameter mode checking. It also adds a comment clarifying why
AST-level checking is necessary in the first place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We used to have one big function, match_signature_by_name, which found
a matching signature, performed out-parameter conversions, and generated
the ir_call. As the code for matching against built-in functions became
more complicated, I split it internally, creating generate_call().
However, I left the same awkward interface. This patch splits it into
three functions:
1. match_signature_by_name()
This now takes a name, a list of parameters, the symbol table, and
returns an ir_function_signature. Simple and one purpose: matching.
2. no_matching_function_error()
Generate the "no matching function" error and list of prototypes.
This was complex enough that I felt it deserved its own function.
3. generate_call()
Do the out-parameter conversion and generate the ir_call. This
could probably use more splitting.
The caller now has a more natural workflow: find a matching signature,
then either generate an error or a call.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Function calls may have side effects that alter variables used inside
the loop. In the fragment shader, they may even terminate the shader.
This means our analysis about loop-constant or induction variables may
be completely wrong.
In general it's impossible to determine whether they actually do or not
(due to the halting problem), so we'd need to perform conservative
static analysis. For now, it's not worth the complexity: most functions
will be inlined, at which point we can unroll them successfully.
Fixes Piglit tests:
- shaders/glsl-fs-unroll-out-param
- shaders/glsl-fs-unroll-side-effect
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Certain applications don't call SwapBuffers before exiting. Yet, we'd
really like to see a bitmap containing the final rendered image even if
they choose never to present it.
In particular, Piglit tests (at least with -auto -fbo) fall into this
category. Many of them failed to dump any images at all.
Dumping one final image at context destruction time seems to work.
We may wish to pursue a more elegant solution later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a Coverity resource leak defect.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These can be used to implement EXT_texture_swizzle without baking
state-dependent swizzle instructions into the shader and forcing
recompiles.
For now, just set them to pass-through mode, so everything continues to
work as it did on Ivybridge. We can optimize this later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We only need one sample, since we don't support multisampling yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Apparently this needs to be the same as in 3DSTATE_WM.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Getting HiZ working means updating all the state packets for resolves
and clears. It's not worth doing until we get the basics working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, these all return 0, as I don't yet want to enable Haswell
support. Eventually they will be filled in with proper PCI IDs.
Also add an is_haswell field similar to is_g4x to make it easy to
distinguish Gen7 and Gen7.5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec ISA volume's "Accumulator Register" section:
"[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is
explicit source or destination operand."
Fixes piglit tests:
- fs-multiply-const-ivec4
- fs-multiply-const-uvec4
- fs-multiply-ivec4-const
- fs-multiply-uvec4-const
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the cryptic void* parameter with a union.
(based on union r600_query_result)
Users of this can still pass uint64* in it, but that cannot work for every
query type, obviously. Most importantly, the code now documents what should
be expected from get_query_result.
This also adds pipe_query_data_pipeline_statistics as per the D3D11 docs.
v2: fix indentation, add comments and use the doxygen style
Reviewed-by: Brian Paul <brianp@vmware.com>
This option allows targets to link against the LLVM shared library
instead of the static libs. With LLVM 2.9, his saves ~11 MB for each of
the r300 target libraries.
Pass a dri2_loader extension to the dri driver when gbm creates the dri
screen. The implementation jumps through pointers in the gbm device
so that an EGL on GBM implementation can provide the real implementations.
The idea here is to be able to create an egl window surface from a
gbm_surface. This avoids the need for the surfaceless extension and
lets the EGL platform handle buffer allocation, while keeping the user
in charge of somehow presenting the buffers (using kms page flipping,
for example).
gbm_surface_lock_front_buffer() locks a surface's front buffer and
returns a gbm bo representing it. This bo should later be returned
to the gbm surface using gbm_surface_release_buffer().
The function that counts the number of TGSI immediates also needs to
emit the immediates. This fixes assorted failures when using polygon
stipple with fragment shaders that have their own immediates.
NOTE: This is a candidate for the 8.0 branch.
They aren't winsys of their own,
just help dealing with them.
v2: add some more comments in vl_winsys.h
Signed-off-by: Christian König <deathsimple@vodafone.de>
"Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This is a regression introduced by commit cdcfd5, which forget to
increase the map_refcount for successfully-mapped region. Thus caused a
wrong non-blanced map_refcount.
This would fix the regression found in the two following webglc testcase
on Pineview platform:
texture-npot.html
gl-max-texture-dimensions.html
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
createDrawable may return NULL value, we should check it, or it will
make a segment failed.
[minor-indent-issue-fixed-by: Yuanhan Liu]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This extension just permits GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_ROWS
and GL_UNPACK_SKIP_PIXELS to be passed to glPixelStore on GLES2 so it
is trivial to implement.
Also fixes the usage of GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES,
which may be set to a BGRA format e.g. for a MESA_FORMAT_ARGB8888 fb.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extension is already exposed for GLES1, but the APIspec
doesnt allow the usage of GL_BGRA_EXT in glTex(Sub)Image2D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed this was missing when writing the "glapi: sort ARB extensions
by number" commit, which at least shows it was effective.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed it was missing based on the lack of a descriptive enum
name from this bug's error message:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This moves two enums out of GL3x.xml. Though since this and
GL_ARB_texture_compression_rgtc are both strict subsets of GL3,
both extensions should have had all their enums in that file
to begin with, not just two of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
And add comments to fill in for extensions that aren't there.
Noticed the comment about "ARB extensions sorted by extension number"
didn't extend to the <xi:include> directives when it became clear
GL_ARB_texture_rg was missing, going by the error message seen here:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This makes it easier to notice in the future if an extension is missing
when it shouldn't be.
Reviewed-by: Brian Paul <brianp@vmware.com>
A later error prints this properly, fix this case to do the same.
v2: remove attribute as per Ian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the xml file covering ARB_blend_func_extended.
v2: fix SRC1_ALPHA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This also seems like a bad idea. There were too many instances for me
to thoroughly scan the code as I did with the last two patches, but a
quick scan indicated that most callers newly allocate a variable,
dereference it, or NULL-check. In some cases, it wasn't clear that the
value would be non-NULL, but they didn't check for error_type either.
At any rate, not checking for this is a bug, and assertions will trigger
it earlier and more reliably than returning error_type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The constructor currently returns a ir_dereference_variable of error
type when provided NULL, but that's about to change in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_record() constructor
seems like a bad idea. Currently, if provided NULL, it returns a
partially constructed value of error type. However, none of the callers
are prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_array() constructor seems
like a bad idea. Currently, if provided NULL, it returns a partially
constructed value of error type. However, none of the callers are
prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
So if anything goes wrong we won't display a random image.
v2: flush before using the surface with the decoder.
Signed-off-by: Christian König <deathsimple@vodafone.de>
If you ran g-s in 16-bpp we'd do a bunch of memory corruption.
now it just misrenders for some other reasons.
applies to stable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ir_validate.cpp: In member function ‘virtual ir_visitor_status ir_validate::visit_leave(ir_swizzle*)’:
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::x’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::y’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::z’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::w’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
valgrind complained about an uninitialised value being used in
glsl_parser_extras.cpp, and this was the one it was giving out about.
Just initialise the value in the fakectx.
Signed-off-by: Dave Airlie <airlied@redhat.com>
for some reason when I configure --with-dri-drivers="" the src/mesa/drivers/dri
Makefile tries to call the am--refresh target in the toplevel Makefile,
we don't have one, and I'm not sure what it should look like.
This makes things continue on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
piglit glx-tfp segfaults on llvmpipe when run vs a 16-bit radeon screen,
it now fails instead of segfaulting, much prettier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When a GL LD_PRELOAD library like apitrace was used,
glXGetProcAddress() would return the preload's symbols instead of
libGL's symbol, leading to infinite recursion when the returned
function was called. This didn't hit apitrace on most apps because
who calls glXGetProcAddress() on the global functions.
The -Bsymbolic, which was present in mklib before automake conversion,
causes the glxcmds.c:GLX_functions table to be resolved at link time,
so that LD_PRELOADs don't affect it any more.
Fixes crashes when running wine under apitrace.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
The default was 32 for the EmitNoLoops=0 case. This allows the oZone3D
soft shadows test to work properly with the vmware driver. Jose reported
that SM3 supports up to 255 loop iterations.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of the hard-coded value of 32. Note that MaxUnrollIterations
defaults to 32 so there's no net change. But the gallium state tracker
can override this.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves Unigine Tropics performance at 1024x768 by 2.06236% +/-
0.50272% (n=11).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset
the buffer object when its streaming wraps. Don't penalize it by
flushing the batch at the wrap point, just allocate a new BO and get
to using it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library
is linked.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
The force-enable option is dropped, now that the hardware we were
concerned about has HiZ on by default. Now, instead of doing
INTEL_HIZ=0 to test disabling hiz, you can set hiz=false.
v2: Disable separate stencil on gen6 when HIZ is turned off.
(previously, this had to be done manually in addition).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a debug option during gen6 transform feedback bringup (and a
similar one existed during gen4 bringup). However, it looks like
we're done with that, and we don't anticipate it being used again,
either for geometry shaders or transform feedback.
Suggested by: Kenneth Graunke <kenneth@whitecape.org>
This was added in the i915/i965 merge from the i915 driver, but I
don't recall it ever being used since then.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you want to test the graphics driver, you want to test it under the
conditions that users will see, not some set of additional fallbacks.
If you want to test swrast, run the swrast driver (or no_rast=true)
instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To avoid redundancies, this patch also removes .deps, .libs, and *.la
from .gitignore files in subdirectories.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As Eric pointed out, we know the cube faces are square at this point
so we only need to test the texture widths for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The max size was 16Kx16K so a 4 byte/pixel, six-sided cube would require
6 GBytes of memory. If mipmapped, 8 GB. Reduce the max size to 4K to
make the total size more reasonable.
Fixes a crash with the new piglit max-texture-size test.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Per the spec, only nearest filtering is supported for integer textures.
Otherwise, the texture is incomplete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of gl_texture_object::_Complete there are now two fields:
_BaseComplete and _MipmapComplete. The former indicates whether the base
texture level is valid. The later indicates whether the whole mipmap is
valid.
With sampler objects, a single texture can appear to be both complete and
incomplete at the same time. See the GL_ARB_sampler_objects spec for more
details. To implement this we now check if the texture is complete with
respect to a sampler state.
Another benefit of this is we no longer need to invalidate a texture's
completeness state when we change the minification/magnification filters
with glTexParameter().
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Merge the mipmap level checking code that was separate cases for 1D,
2D, 3D and CUBE before.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the simple MaxLevel < BaseLevel test earlier to be closer to where
we error-check BaseLevel. Also, use the local baseLevel var in more places.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make the no-change case faster, as we do for the other object-reference
functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We want to start emitting an INVALID_OPERATION from here for transform
feedback. Note that this forced dlist.c to almost not use this
function, since it wants different behavior during dlist compile.
Just pull the non-TF, non-GS test out for compile, because:
1) TF doesn't matter in that case because there's no drawing.
2) I don't think we're going to see GSes and display lists in the same
context, if we don't do GL_ARB_compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a build problem where EGL links to libgbm.la, which encodes
a relative path to it's libglapi.so dependency. The relative path
breaks when the linker tries to resolve it from src/egl/main instead
of src/gbm. Typically we silently fall back to the system
libglapi.so, which is wrong and breaks when there isn't one.
Morale of the story: don't mix mklib and libtool.
Although some hardware support NPOT cubemap, but it seems we don't know
the right layout for NPOT cubemap. Thus seems we need do fallback for
other platforms as well.
See comments inline the code for more detailed info.
v2: give a more detailed info about why we need fallback for other
platfroms as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46666
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If we failed to allocate a memory resource for the texture we'd crash
when we tried to map it. Now we propogate the NULL back up to the
texstore code and generate GL_OUT_OF_MEMORY.
Fixes a crash with the upcoming piglit max-texture-size test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
From the GLSL 1.30 spec:
The discard keyword is only allowed within fragment shaders. It
can be used within a fragment shader to abandon the operation on
the current fragment. This keyword causes the fragment to be
discarded and no updates to any buffers will occur. Control flow
exits the shader, and subsequent implicit or explicit derivatives
are undefined when this control flow is non-uniform (meaning
different fragments within the primitive take different control
paths).
v2: Don't emit the final HALT if no other HALTs were emitted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
By setting lod to 0 in the builtin function implementation, we avoid
needing to update all the visitors to ignore LOD in this case, when
the hardware drivers actually want to ask for LOD 0 for rectangular
textures.
Fixes piglit spec/GLSL-1.40/textureSize-*Rect.
v2: Change style of looking for substrings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the one builtin function claimed to be dropped due to the
ARB_compatibility split.
Fixes piglit spec/GLSL-1.40/compiler/ftransform.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the process slightly more debuggable, though it would be
nice if the build just failed immediately instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly this is a matter of removing variables that have been moved to
the compatibility profile. There's one addition: gl_InstanceID is
present in the core now.
This fixes the new piglit tests for GLSL 1.40 builtin variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids extra if statements in the common case of just comparing
two expressions that don't involve assignments or function calls,
along with simplifying the handling of constant expressions. Reduces
i965 instructions generated in unigine tropics and sanctuary,
yofrankie, warsow, gstreamer shaders, and the weston compositor.
shader-db results:
Total instructions: 213052 -> 212752
38/1246 programs affected (3.0%)
14309 -> 14009 instructions in affected programs (2.1% reduction)
The error was removed in:
commit 719909698c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Tue Oct 18 16:01:49 2011 -0700
mesa: Rewrite the way uniforms are tracked and handled
The GL_ARB_robustness spec doesn't say the implementation
should truncate the output, so just return after setting
the required error like it did before the above commit.
Also fixup an old comment and add an assert.
NOTE: This is a candidate for the 8.0 branch.
Handle the special case of glFramebufferTextureLayer() for which we pass
teximage = 0 internally in framebuffer_texture(). This patch makes failing
piglit test fbo-array, fbo-depth-array to pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47126
V4: Removed the duplicated code.
Note: This is a candidate for the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the soon-to-be-removed _DD_NEW_SEPARATE_SPECULAR flag.
Note: there's a similar composite _MESA_NEW_NEED_EYE_COORDS flag set already.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just use the corresponding _NEW_x flags intead. The _DD_NEW_x flags
will be removed in a following patch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The computed stencil.clear and depth.clear values aren't used anywhere.
Those fields have been removed too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Set the close on exec flag when opening dri character devices, so they
will be closed and free any resouces allocated in exec.
Signed-off-by: David Fries <David@Fries.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This issue might recur on other OSes. If so then it might be better
to remove the C-preprocessor magic, and use fully qualified defines
instead.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This state is needed for deciding whether or not to log
application messages with IDs that haven't been specifically
passed to glDebugMessageControlARB yet.
State for each individual ID number ever passed to
glDebugMessageControlARB (per-context) still needs to be added.
Unfortunately, Unigine Heaven 3.0 still needs this.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
min_index/max_index are merely conservative guesses, so we can't
make buffer overflow detection based on their values.
Tested-by: Jakob Bornecrantz <jakob@vmware.com>
There are several cases in which we need to explicity "rebase" colors
(ex: set G=B=0) when getting GL_LUMINANCE textures:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=46679
Also fixes the new piglit getteximage-luminance test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Based on a patch submitted by Vic Lee. The other part of his patch
which checked the fs pointer wasn't needed.
This fixes a crash when clear() is called before any VS or FS is set.
But this can only happen when the driver is used without the Mesa
state tracker.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets xine working with VDPAU.
v2: some minor bugfixes.
v3: create the resource with the subsampled
format to avoid tilling problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
These will be used by glReadPixels() and glGetTexImage() to fix issues
with reading GL_LUMINANCE and other formats.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we were only counting top-level instructions. But if we have
an assignment of a giant expression tree (such as the ones eventually
generated by glsl-fs-unroll), we were counting the same as an
assignment of a variable deref.
glsl-fs-unroll-explosion now fails in a reasonable amount of time on
i965 because the unrolling didn't go ridiculously far.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I will use SX_MISC instead.
This reverts commit 734792e83f.
Conflicts:
src/gallium/drivers/r600/evergreen_hw_context.c
src/gallium/drivers/r600/evergreen_state.c
src/gallium/drivers/r600/r600_hw_context.c
src/gallium/drivers/r600/r600_pipe.h
draw module calls back into the driver and sets certain parts
of the state to whatever it needs, unfortunately unless you
get the ordering of calls to draw just right you'll end up
reseting your own driver state. That's what was happening to us
draw module would under certain conditions reset our own driver
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the libGLU.so.* build when a system libGL.so is not present
since it is relying on the lib/ to build against until it gets
converted to automake.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
That is by making the dri extension variables static in gbm_dri.c.
The image_lookup_extension is provided by egl_dri2 when using x11 or wayland
platforms, when using the drm platform, gbm_dri has a wrapper for it.
Both use the same variables name image_lookup_extension.
Since -fvisibility=hidden was (probably by mistake) removed when converting to
automake, the "image_lookup_extension" symbol from egl_dri2.c became exported
in libEGL.so, so "image_lookup_extension" from gbm_dri.c was ignored.
This resulted in calling incorrect callbacks.
We cant make the image_lookup_extension static in egl_dri2.c right now,
since its used across multiple files.
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?id=58099
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the texture is a 1D array, don't remove the border pixel from the
height. Similarly for 2D array textures and the depth direction.
Simplify the function by assuming the border is always one pixel.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).
Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.
But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.
Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)
v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This add clipdistance support like the non-llvm draw paths,
if we have a clip distance we compare with it instead of doing
the dot4.
We also have to put the have_clipvertex bit into the emitted
vertex header.
Fixes vs-clip-distance-all-planes-enabled, vs-clip-distance-const-reject,
vs-clip-distance-enables, vs-clip-distance-implicitly-sized,
vs-clip-distance-in-param, vs-clip-distance-uint-index.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the rest of the piglit clipvertex tests.
v2: fixup comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We incorrectly setup clipmask for gl_ClipVertex, this fixes the clipmask
setup.
v2: fix comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
fix comment
This is just a simple text file containing a list of goals for gallivm/llvmpipe
and some info on what is required to get there along with some info on who
is looking at things.
v2: add EXT_texture_array.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_max_texture_levels() is also used to test valid texture target
in _mesa_GetTexLevelParameteriv(). GL_TEXTURE_CUBE_MAP is not allowed
as texture target in glGetTexLevelParameter(). So, this should throw
GL_INVALID_ENUM error.
Few other functions which use _mesa_max_texture_levels() like
getcompressedteximage_error_check() and getteximage_error_check()
also don't accept GL_TEXTURE_CUBE_MAP.
Above fix makes piglit fbo-cubemap test to fail. This is because of
incorrect texture target passed to _mesa_max_texture_levels() in
framebuffer_texture(). Fixing that as well
Note: This is a candidate for the stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
If I had a dollar for every time I wrote this patch, I'd have about
$10 :-)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's even a comment in the code containing the right swizzling
computations!
Previously this has not been noticed because we need to manually
enabled swizzling on snb/ivb (kernel 3.4 will do that) and we
don't use the separate stencil on ilk (where the bios enables
swizzling). This fixes
piglit ./bin/fbo-stencil readpixels GL_DEPTH32F_STENCIL8 -auto
on recent drm-intel-next kernels.
Also remove the comment about ivb, it's stale now.
Swizzling detection is done by allocating a temporary x-tiled
buffer object. Unfortunately kernels before v3.2 lie on snb/ivb
because they claim that swizzling is enable, but it isn't. The
kernel commit that fixes this for backport to pre-v3.2 is
commit acc83eb5a1e0ae7dbbf89ca2a1a943ade224bb84
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 12 20:49:16 2011 +0200
drm/i915: fix swizzling on gen6+
But if the kernel doesn't lie, this now works on swizzling and
not swizzling machines.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the previously used wl_display_destroy.
wl_display_destroy was povided by wayland-client.so and
wayland-server.so, to resolve that conflict its renamed client-side.
Otherwise streamout with rasterizer discard will make the kernel upset
if the state tracker doesn't set a depth-stencil-alpha state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Unused by the current stack and APIs, therefore untestable.
It was used to facilitate the transition to integers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
For polygons, we have been using face culling with success, but that doesn't
work for points and lines.
Setting the point size and line width to 0 fixes it.
Also improve it even more by setting SCREEN_SCISSOR to a zero area.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement it right using STRMOUT_CONFIG.RAST_STREAM. This fixes rasterizer
discard with points and lines.
This also adds another derived state. It's a combination of rasterizer discard
and streamout enable.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We must use VPORT_SCISSOR, because that's the only one we can use for multiple
scissor rectangles in ARB_viewport_array.
R700 can use the VPORT_SCISSOR_ENABLE bit, but R600 doesn't have that and must
emit a 8192x8192 rectangle if scissor is disabled.
This commit also cleanups magic numbers in create_rs_state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
VPORT_SCISSOR is the OpenGL scissor. How do I know? Because there are
16 of them just like GL4.1 has multiple scissor rectangles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Also use XXX in the other ones, because it's the most used word for that
purpose in Mesa.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Timer queries should be able to measure the time spent in u_blitter as well.
Queries are split into two groups: the timer ones and the others (streamout,
occlusion), because we should only suspend non-timer queries for u_blitter,
and later if the non-timer queries are suspended, the context flush should
only suspend and resume the timer queries.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
And rename or inline functions where appropriate.
There is no reason to keep this stuff in r600_hw_context.c.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The current code would ignore the point size specified by gl_PointSize
builtin variable in vertex shader on Pineview. This patch servers as
fixing that.
This patch fixes the following issues on Pineview:
webglc: https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/rendering/point-size.html
piglit: glsl-vs-point-size
NOTE: This is a candidate for stable release branches.
v2: pick Eric's nice tip for fixing this issue in hardware rendering.
v3: the last arg of EMIT_ATTR specify the size in _byte_. (Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the egl_gallium.so driver build when no system libEGL.so is
present, since it's relying on the lib/ to build against until it gets
converted to automake.
We were looking at the size of batch.map for how big the batchbuffer
was, but on 865 we just use a single-page batchbuffer due to hardware
limits.
v2: Removed check for sizeof map < bo->size, since that's always false.
[change by anholt]
NOTE: This is a candidate for release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
In order to prevent an overflow of the batch buffer when emitting
triangles, we need to limit the initial primitive to fit within the
current batch. To do we need to measure the remaining space and thence
compute the maximum number of vertices that fit into that space.
Reported-by: Kurt Roeckx <kurt@roeckx.be>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
The hardware, like i915, uses an inclusive bounds on min and max for
the drawing rectangle, but we were providing a number for exclusive.
The number of bits used by the hardware only covers this value going
up to the maximum size, so when we programmed 2048 as the maximum
inclusive X, it saw a maximum X of 0 and clipped all rendering. This
caused rendering failures in gnome-shell.
Fixes piglit fbo-maxsize.
v2: dropped changes to the blitter, which does use an exclusive x2, y2.
[change by anholt]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45558
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
Michel pointed out that my assumption of a global
index namespace is incorrect and breaks r300g.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should speed things up a bit, but also shows
some bugs with the kernel implementation.
v2: require xcb-dri2 version 1.8
Signed-off-by: Christian König <deathsimple@vodafone.de>
While the ARB_map_buffer_range extension spec says nothing about these
queries -- they were added in GL 3.0 --, it seems like this could be an
error in the extension spec. This is one of the extensions, like
ARB_framebuffer_object, that "back ports" OpenGL 3.0 functionality to
previous versions. These extensions are supposed to provide identical
functionality to OpenGL 3.0. The other cases of mismatches have been
determined to be bugs in the extension specs.
And tools like apitrace rely on such queries to function properly.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We currently don't support gl_PrimitiveID, and I believe asking the
hardware to generate it results in vertex cache invalidations.
This could result in slowdowns for applications that use gl_InstanceID,
which would be counter-productive. Just turn it off for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
visit(ir_variable *) sets dst_reg::writemask to the appropriate channel
for system values. Unfortunately, visit(ir_dereference_variable *) then
calls swizzle_for_size, which for a float, sets the swizzle to .x.
This works for gl_VertexID, since we store it in the .x component (see
brw_draw_upload.c:732 - VID), but fails for gl_InstanceID (IID) since we
store it in the .y channel.
To fix this, avoid calling swizzle_for_size on ir_var_system_values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Originally ARB_draw_instanced only specified that ARB decorated name.
Since no vendor actually implemented that behavior and some apps use
the undecorated name, the extension now specifies that both names are
available.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When you called them in a display list compile before, you would just
end up calling through NULL.
Fixes piglit GL_ARB_draw_instanced/dlist.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kernels prior to 271d81b84171d84723357ae6d172ec16b0d8139c (March 2011)
don't support relocations outside of the target buffer object. Rather
than guarding this with a I915_PARAM_HAS_RELAXED_DELTA check, just
smash the bound to 0xfffff001 like we do on Ironlake.
This effectively gives us no upper bound check, just like we did prior
to commit 271d81b84171d84723357ae6d172ec16b0d8139c.
Daniel Vetter would also like to mention that this relies on the guard
page at the end of the GTT.
NOTE: This is a candidate for release branches.
Fixes a regression since 271d81b84171d84723357ae6d172ec16b0d8139c.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46766
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The drivers/ walk-through-subdirs makefile is converted as well so I
didn't need to keep EGL_DRIVERS_DIRS along with the per-driver
HAVE_EGL_DRIVER_WHATEVER.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The default case code was set up in a separate way, while this makes
it more normal. I wanted to add code to the explicit x11 platform and
default x11 platform cases in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All users of the shine table outside of the tnl module
are gone. Move the implementation into the tnl module and
prefix the public functions with _tnl.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are now only used in the tnl lighting stage, where
they are validated through the tnl driver function NotifyMaterialChange
called in tnl/t_vb_light.c, we can not omit calling
_mesa_validate_all_lighting_tables (which only validates the shine tables)
in main/light.c.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Use direct computation of pow for computing the shininess
in _tnl_RasterPos. Since the _tnl_RasterPos function is still
used by plenty drivers that do only need the shine table for
_tnl_RasterPos but do not make use of swtnl computations, this
enables pushing down the shine table computation and validation
into the tnl module, which will happen in a followup change.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are implicitly invalidated by having
a different shininess value than the current one, we can
omit the explicit invalidation of the shine table.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This lets us use the resource_copy_region() path when blitting from
R8G8B8A8 to R8G8B8x8, for example.
v2: be smarter when src_format==dst_format
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assertions of the form assert(a && b) should be written as separate assertions
so that you can actually tell which part is false when there's a failure.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Move structs, enums, etc so they're in more logical order. In particular,
the shader and transform feedback-related structs/enums were pretty
scattered around.
After biasing we need to clamp to be sure we don't exceed the number of
levels in the mipmap. This fixes an assertion at svga_sampler_view.c:70
v2: simplify the biasing, clamping code per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to allocate new space every time to avoid blocking on the last
HiZ op completing. There are two easy ways to do this:
brw_state_batch() and intel_upload_data(). brw_state_batch() is
simpler and avoids another buffer allocation.
Improves Unigine Tropics performance 0.376416% +/- 0.148722% (n=7).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ralloc string appending functions were originally intended for
simple, non-hot-path uses like printing to an info log.
Cuts Unigine Tropics load time by around 20% (6 seconds).
v2: Avoid strlen() on every newline, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
Both callers of rewrite_tail immediately compute the new total string
length by adding the (known) length of the existing string plus the
length of the newly appended text. Unfortunately, callers generally
won't know the length of the new text, as it's printf-formatted.
Since ralloc already computes this length, it makes sense to add it in
and save the caller the effort. This simplifies both existing callers,
but more importantly, will allow for cheap-appending in the next commit.
v2: The link_uniforms code needs both the old and new length.
Apply the obvious fix (which sadly makes it less of a cleanup).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.
I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.
With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Backends usually advertise a SVGA3D_DEVCAP_MAX_POINT_SIZE between 63 and
256, so an hardcoded max point size of 80 is often incorrect.
This limitation does not apply for anti-aliased points (as they are done
via draw module) but we still advertise the same limit for both, because
all others pipe drivers do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa has a fast path for the generic fallback when using glReadPixels
for RGBA data which uses memcpy. However it was really difficult to
hit this case because it would not be used if any transferOps are
enabled. Any type apart from floating point or non-normalized integer
types (so any of the common types) would force enabling clamping so
the fast path could not be used. This patch makes it ignore clamping
when determining whether to use the fast path if the data type of the
buffer is an unsigned normalized type because in that case clamping
will not have any effect anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=46631
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
postpone unreferences until end of function, as the ones in use will
get naturally dereferenced.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Can't see any reason this wouldn't be better off as an inline.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some backends may advertise more temps than SVGA3D_TEMPREG_MAX, but the
driver is hardwired to only support up to the value defined by
SVGA3D_TEMPREG_MAX, so clamp to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
KILP instruction inside IF blocks were being lowered to an unconditional
KIL. Since r300 doesn't support branching, when the IF's were lowered
to conditional moves, the KIL would always be executed. This is not a
problem with the mesa state tracker, because the GLSL compiler handles
lowering IF's, but this bug was appearing in the VDPAU state tracker,
which does not use the GLSL compiler.
Note: This is a candidate for the stable branches.
The xvmc state tracker is completely seperate and
doesn't shares code or anything else with the
xorg state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This patch allows the Mac OS X SCons build to complete. The assembly
sources contain psuedo-ops that not are supported on Mac OS X.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were inverting the meaning of the stencil op flags: in svga/d3d
the normal incr/decr wraps and the SAT ops clamp.
This fixes piglit failures (at least stencil-twoside and stencil-wrap).
We should backport this everywhere we can.
Reviewed-by: Brian Paul <brianp@vmware.com>
Two of the switch cases used PIPE_FORMAT_ tokens instead of SVGA3D_ tokens.
As it happens, the token values are equal for these formats so there's no
net change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We always mapped the query buffer in begin_query, causing stalls
if the buffer was busy.
This commit reworks it such that the query buffer is only mapped
in get_query_result as it's supposed to be.
The query buffer is no longer treated as a ring buffer. Instead, the results
are just appended and when the buffer is full, we create a new one. One query
can have more than one query buffer, though that's a very rare case.
Begin_query releases all query buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
From http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt:
Accepted by the <cap> parameter of Enable, Disable and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv
and GetDoublev:
TEXTURE_CUBE_MAP_SEAMLESS 0x884F
This caused a change in enums.c, which is manually built from the .xml
files.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The linked list of memory allocations was not protected by a mutex.
This lead to sporadic failures with multi-threaded apps.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, instead of immediately freeing deleted surfaces, hang onto
them in a cache to do quick re-allocation. This helps when surfaces
are frequently destroyed and then reallocated a bit later.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
There was a SVGA_HOST_SURFACE_CACHE_BYTES symbol, but it was never
used.
Now when we go to add a newly deleted surface to the cache we check
if the cache size would be exceeded. If so, try to free the least
recently "unused" surfaces until the cache is smaller. If we can't
do that, simply don't cache the newly deleted surface. The alternative
involves flushing and waiting and we don't want to do that.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, if shader translation failed for any reason we'd keep trying
to translate the shader over and over again during state validation.
The dummy fragment shader emits solid red so that might be visual
clue that translation is failing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The assertion recently added in dst_register() was invalid because that
function is also (suprisingly) used to declare constant registers.
Move the assertion to the callers where we're really creating temp
registers and add some code to prevent emitting invalid temp register
indexes for release builds.
Also, update the comment for get_temp(). It didn't return -1 if it
ran out of registers and none of the callers checked for that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And assert on the register index in dst_register(). The dest can
only be an output or temp reg and there's more of the later.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit 980f6f1 (mesa: move gl_texture_image::Width/Height/DepthScale
fields to swrast) moved the initialization of the Width, Height, and
DepthScale fields to _swrast_alloc_texture_image_buffer(). However,
i915 doesn't call this function because it performs its own buffer
allocation. As a result, the Width, Height, and DepthScale fields
weren't getting initialized properly, and some operations requiring
swrast would fail.
This patch ensures that Width, Height, and DepthScale are properly
initialized by separating the code that sets them into a new function,
_swrast_init_texture_image(), which is called by
intel_alloc_texture_image_buffer() as well as
_swrast_alloc_texture_image_buffer(). It also moves the
initialization of _IsPowerOfTwo into this function.
Fixes piglit test fbo/fbo-cubemap on i915.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=41216
This is a candidate for the 8.0 branch.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GBM needs the buffer format in order to communicate with DRM and clients
for things like scanout.
So track the DRI format requested in the various back ends and use it to
return the DRI format back to GBM when requested. GBM will then map
this into the GBM surface type (which is in turn based on the DRM fb
format list).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These were rotting in an internal branch, but contain nothing confidential,
and would be much more useful if kept up-to-date with latest gallium
interface changes.
Several authors including Keith Whitwell, Zack Rusin, and Brian Paul.
In the gen6 GS case, we were under-counting and so other state would
get smashed. In the VS case, we were over-counting, so everything was
fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This was copy and paste from the VS where I had similar code. We're
only looking at things derived from BRW_NEW_VERTEX_PROGRAM in this
block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This is a state which is derived from other states and is actually the first
state which doesn't correspond to any gallium state.
There are two state flags:
bool occlusion_query_enabled
bool flush_depthstencil_enabled
Additional flags can be added later if needed, e.g. bool hiz_enabled.
The emit function will have to figure out the register values by itself.
It basically just emits the registers when the state changes.
This commit also adds a few helper functions for writing registers directly
into a command stream.
This is the first pure command buffer. It contains CS initialization
packets and emits invariant state (i.e. the registers which never or rarely
change).
The affected registers are removed from *_hw_context.c, so that both ways
of emitting commands can co-exist.
v2: emit context_control in cayman's start_cs too
Suggested by José.
We don't provide shader caching in CSO. Most of the time the api provides
object semantics for shaders anyway, and the cases where it doesn't
(eg mesa's internall-generated texenv programs), it will be up to
the state tracker to implement their own specialized caching.
Improves VS state change microbenchmark performance by 7.08729% +/-
1.22289% (n=10) on gen7, because we don't upload the 64 dwords of
unused binding table any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step toward making the samplers/binding tables reflect
sampler uniform mappings instead of embedding those in the programs.
No significant performance difference on the microbenchmark (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always say no. Improves VS state change microbenchmark performance
7.68747% +/- 1.40826% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, 640x480 nexuiz is running 0.169118%
+/- 0.0863696% faster (n=121). On a VS state change microbenchmark,
performance is increased 8.28645% +/- 0.460478% (n=52).
v2: Fix CACHE_NEW_VS comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces recomputation of state based on non-clipping-related
transform changes, and is a step toward removing VUE map
recomputation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For a 1D texture array, the border only applies to the width. For a 2D
texture array the border applies to the width and height but not the depth.
Sucha cases were not handled correctly in _mesa_init_teximage_fields().
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tracing function entry/exits is a bit pointless
when VDPAU_TRACE=1 does the same thing.
v2: use WARN instead of ERR for application problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
Like TGSI_OPCODE_ARL, destination should be an integer.
This fixes invalid LLVM IR on an internal state tracker (currently Mesa
never emits this opcode).
In the future consider making ADDR register also a integer-as-float array,
like all other register kinds, or simply replace ADDR & ARR/ARL with
integer temp and instructions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes this GCC warning.
native_drm.c:153:1: warning: ‘drm_display_authenticate’ defined but not
used [-Wunused-function]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Avoid setting dirty state flags when enabling or disabling a vertex
attribute arrays when there's no change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If you're resorting to the dummy shader, you've probably already turned
off SIMD16 mode. But if you didn't, it would die in a fire.
We could either fail to compile in SIMD16 mode...or just fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dummy FB write failed to specify EOT and a message length, causing
the GPU to hang. Now we can enjoy "everyone's favorite color" again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes this GCC warning.
mask.c: In function ‘mask_layer_fill’:
mask.c:387:12: warning: variable ‘alpha_color’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes these GCC warnings.
glx_api.c: In function ‘choose_visual’:
glx_api.c:678:8: warning: variable ‘trans_value’ set but not used
[-Wunused-but-set-variable]
glx_api.c:677:8: warning: variable ‘trans_type’ set but not used
[-Wunused-but-set-variable]
glx_api.c:663:8: warning: variable ‘min_ci’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that we have a index_range_invalid flag, we can just use that rather
than calling vbo_validated_drawrangeelements directly and returning.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This failed to take basevertex into account:
If basevertex < 0:
(end + basevertex) might actually be in-bounds while 'end' is not.
We would have clamped in this case when we probably shouldn't.
This could break application drawing.
If basevertex > 0:
'end' might be in-bounds while (end + basevertex) might not.
We would have failed to clamp in this place. There's a comment
indicating the TNL module depends on max_index being in-bounds;
if so, it would likely break horribly.
Rather than trying to clamp correctly in the face of basevertex, simply
delete the clamping code and indicate that we don't have a valid range.
This causes _tnl_vbo_draw_prims to use vbo_get_minmax_indices() to
compute the actual bounds, which is much safer.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The application supplied [start, end] range is merely a conservative
hint of the ranges of index values inside the index buffer. There is no
requirement that all vertices in the range [start, end] be referenced.
Passing an 'end' value larger than the maximum legal index is perfectly
acceptible; applications can legally pass 0xffffffff when they don't
have a tighter bound readily available.
Thus, the warning doesn't indicate a correctness issue; it could only
indicate a performance issue. However, it does not even do that.
glDrawRangeElements is designed to optimize non-VBO vertex data uploads
by providing an upper bound on the size of buffers a driver would need
to allocate. With VBOs, the data is already in an uploaded buffer, so
the range doesn't help.
The clincher is: we only know _MaxElement for VBOs. For user-space
arrays, we just set it to 2,000,000,000 (see mesa/main/varray.h:63.)
So we can only check this in the case where it is not useful.
Many applications, including the Unigine demos, currently trigger this
warning, which suggests the applications are buggy when they're actually
fine. Eliminating the warning should confuse users less while not
actually losing any benefit to application developers.
NOTE: This is a candidate for release branches.
Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There's a serious trap for drivers: RenderTexture() does not indicate
that the texture is currently bound to the draw buffer, despite
FinishRenderTexture() signaling that the texture is just now being
unbound from the draw buffer.
We were acting as if RenderTexture() *was* the start of rendering and
that we could make texturing incoherent with the current contents of
the renderbuffer. This caused intel oglconform sRGB
Mipmap.1D_textures to fail, because we got a call to TexImage() and
thus RenderTexture() on a texture bound to a framebuffer that wasn't
the draw buffer, so we skipped validating the new image into the
texture object used for rendering.
We can't (easily) make RenderTexture() indicate the start of drawing,
because both our driver and gallium are using it as the moment to set
up the renderbuffer wrapper used for things like MapRenderbuffer().
Instead, postpone the setup of the workaround render target miptree
until update_renderbuffer time, so that we no longer need to skip
validation of miptrees used as render targets. As a bonus, this
should make GL_NV_texture_barrier possible.
(This also fixes a regression in the gen4 small-mipmap rendering since
3b38b33c16, which switched
set_draw_offset from image->mt to irb->mt but didn't move the irb->mt
replacement up before set_draw_offset).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44961
NOTE: This is a candidate for the 8.0 branch.
Infer from the operand the type of value to store.
MOV is untyped but we use the float store path.
v2: make MOV use float store path.
I've had to squash merge the ARL fix to be stored
as an integer in here to avoid regressions in a number
of piglit tests.
From now on ARL stores to an integer just like HW does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The infers the type of data required using the opcode,
and casts the input to the appropriate type.
So far this only handles non-indirect constant and temporaries.
v2: as per Jose suggestion, fetch immediates via floats
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are used inside the action handlers for the integer opcodes.
v2: use uint_bld/int_bld, drop higher level uint_bld.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now just pass the current context, but when we want to
store int or unsigned we need to pass those later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These two functions produce the src/dst types for an opcode.
MOV is special since it can be used to mov float->float and int->int,
so just return VOID.
v2: use a new enum for the opcode type as per Jose's suggestion.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the texture format is integer, the incoming user data must also be
integer (and similarly for non-integer textures).
NOTE: This is a candidate for the stable branches.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I recently discovered this text in the BSpec. It seems wise to comply,
though I haven't observed it to fix anything yet.
Fixes a regression in glean/fbo since 28cfa1fa21.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes (with the previous commit) piglit GL_ARB_multisample/pushpop.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the table of of push/pop attributes, this one doesn't fall under
the enable group.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes build errors like
In file included from glapi_dispatch.c:91:
../../../src/mapi/glapi/glapitemp.h:4641: error: no previous prototype for
'glDrawBuffersNV'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lucas Stach <dev@lynxeye.de>
Similar to the previous commit. Also fix incorrect setting of the
sampler view's state after it's created. We need to specify the
first/last_level fields in the template instead.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rather than the one in st_texture_object. This sampler view really has
no connection to the one used for rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And remove needless & 0xff in _mesa_pack_uint_24_8_depth_stencil_row().
As suggested by José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, this function only handled 2D textures.
The fallback texture is used when we try to sample from an incomplete
texture object. GLSL says sampling an incomplete texture should return
(0,0,0,1).
v2: use a 1-texel texture image, per José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Added in _mesa_pack_uint_24_8_depth_stencil_row(). This could be hit
by something like glDrawPixels(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8)
into a MESA_FORMAT_Z32_FLOAT_X24S8 buffer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The st_renderbuffer_alloc_storage() function is used to allocate both
window-system buffers and user-created renderbuffers. The later kind
are never directly displayed so don't set PIPE_BIND_DISPLAY_TARGET for
those surfaces.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit dc7f449d1a introduced a new method
for avoiding MOVs: try to rewrite the destination of the instruction
that produced the RHS so it writes into the LHS.
Unfortunately, this is not safe for swizzled texturing operations, as
they return a set of four contiguous registers. Consider the following:
(assign (x)
(var_ref vec_ctor_x)
(swiz x (tex vec4 (var_ref m_sampY) (var_ref m_cordY) 0 1 ())))
In this case, the source and destination registers are equal, since
reg_offset is 0 for both. Yet, this is only a partial move: the texture
operation generates four registers, and the LHS only covers one.
Fixes color distortion in XBMC when using GLSL shaders.
NOTE: This is a candidate for the 8.0 branch (with the previous commit).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44333
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain instructions write more than one register. Texturing, for
example, returns 4 registers. (We set rlen to 4 even for TXS and float
shadow sampling.) Some math functions return 2. Most return 1.
The next commit introduces a use of this function.
NOTE: This is a candidate for the 8.0 branch (dependency of a fix).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reallocate/resize decompress FBO only if texture image width/height is
greater than existing decompress FBO width/height.
This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A filter strength of zero or one doesn't make any
sense. Thanks to Andy Furniss for pointing this out.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The virtual address but follow the alignment requirement of the
tiled surface. The bo from handle case is not properly fix. Need
bigger change for a proper fix. Work around that by enforcing 1M
alignment for those bo.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 2e5a1a2 (intel: Convert from GLboolean to 'bool' from
stdbool.h.) converted the "specoffset" local variable (in
intel_tris.c) from a GLboolean to a bool. However, GLboolean was the
wrong type for specoffset--it should have been a GLuint (to match the
declaration of specoffset in struct intel_context).
This patch changes specoffset to the proper type.
Fixes piglit test general/two-sided-lighting-separate-specular.
This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45917
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out the same messages work on gen7, we were just being paranoid.
Fixes the penumbra shadows mode of Lightsmark since the register
allocation fix.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We just abort later, but at least this should result in more
informative bug reports.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.
Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.
Note: this is a candidate for the stable branches.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the pixel store operations in decompress_texture_image().
decompress_texture_image() is used in glGetTexImage() for compressed
textures with unsigned, normalized values.
It also fixes the failures in intel oglconform pxstore-gettex due to
following sub test cases:
- Test all mipmaps with byte swapping enabled
- Test all small mipmaps with all allowable alignment values
- Test subimage packing for all mipmap levels
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40864
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
X86Target is a variable, and therefore isn't defined at compile time. So
LLVM_NATIVE_ARCH == X86Target
is translated into
0 == 0
and since X86 is first, we always pick it.
Therefore we replace the logic with PIPE_ARCH_*.
https://bugs.freedesktop.org/show_bug.cgi?id=45420
Fixes a regression from commit 660ed923de.
The basic idea is to look at the format of the dest renderbuffer and
choose either GLubyte or GLfloat for colors. The previous code used
_mesa_format_to_type_and_comps() which could return a bunch types other
than ubyte/float.
Determine the datatype at renderbuffer mapping time to avoid frequent
calls to the format query functions.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45577
Fix build with llvm-3.1svn.
llvm-3.1svn r149918 changed BufferMemoryObject::getExtent and
BufferMemoryObject::readByte from const member functions to non-const
member functions in include/llvm/Support/MemoryObject.h.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Ironlake appears to check our pointer against the General State Base
Address upper bound, rather than ignoring the zero bound as it ought.
Unfortunately, since we leave GSBA set to zero, there is no logical
upper bound. Set it to the maximum possible value, which should work
since our virtual addresses only go up to 2GB.
+94 piglits.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves nexuiz performance 0.65% +/- .10% (n=5) on my gen6, and .39%
+/- .11% (n=10) on gen7. No statistically significant performance
difference on warsow (n=5, but only one shader has MADs).
v2: Add support for MADs in 16-wide by using compression control.
v3: Don't generate MADs when it will force an immediate to be moved to a temp.
(it's not clear whether this is a win or not, but it should result in less
questionable change to codegen compared to v2).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
Our only instruction with a 3rd source so far was linterp, and that
value was never register-allocated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WGL_ARB_pixel_format establishes the existence of pixel formats which
are invisible to GDI.
However we still need to pass a valid pixelformat to GDI, so that
context creation/binding works.
The actual WGL_TYPE_RGBA_FLOAT_ARB implementation is from Brian Paul.
The mapping from TEXTURE_x_INDEX to GL_TEXTURE_x was broken in
alloc_proxy_textures() because the elements in the targets[] array
were in the wrong order.
This didn't actually cause any failures since we never really use the
proxy texture's Target field. But let's get it right.
NOTE: This is a candidate for the 8.0 branch.
Use the float tables instead. Pixel maps are seldom used so this
shouldn't be a big deal. Next, we can get rid of the gl_pixelmap::Map8
array.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's a mismatch in row strides for compressed textures between
what Driver.MapTextureImage() returns and what the software fetch-texel
functions use. Move it down a layer. The next step would be to fix
this in the fetch-texel functions.
Just use pow() instead. Spot lights aren't too common and fixed-function
lighting isn't as important as it used to me.
This saves 32KB per context. Each table was 4KB and there's 8 lights.
This is a shader based median filter, generally
used for noise reduction, it could still need some
improvements, but should usually work out of the box.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The wm max threads is in the same dword as the dispatch enable. The
hardware gets super angry if you set max threads to 0, even if you
aren't dispatching threads.
Avoid unrollong loops that are either nested loops or
where the loop body times the unroll count is huge.
The change is far from being perfect but it extends the
loop unrolling decision heuristic by some additional
safeguard. In particular this cuts down compilation of
a shader precomputing atmospheric scattering integral
tables containing two nesting levels in a loop from
something way beyond some minutes (I never waited for
it to finish) to some fractions of a second.
This fixes piglit tests glsl-fs-unroll-explosion and
glsl-vs-unroll-explosion on r600g.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type + 2 * border.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1) + 2 * border
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case:
max_values negative.textureSize.textureCube
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If we have no more enabled samplers and we've reset all the previously
used ones, no need to keep going around this loop.
(just moved some stuff around to clean it up a bit).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From what I can see we were taking the debug path all the time,
when we probably only want it for enable debug path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't want our VBOs mapped when we're drawing. This change checks
if the vertex store VBO is mapped before we execute a list, unmaps it,
then remaps it after drawing. This situation pops up when building a
nested display list in GL_COMPILE_AND_EXECUTE mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
Something has gone wrong if swrast is requested but cannot be
loaded. The user really should be made aware of this, (and instructed
to set LIBGL_DEBUG for more details).
The wording of this error message is updated from "reverting to
indirect rendering" to the more objectively descriptive "failed to
load driver: swrast". The former wording makes assumptions about what
the calling code will decide to do next, rather than simply describing
what went wrong within the current function. The new wording is
consistent with the critical errors recently added for hardware
drivers that fail to load.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Something has gone wrong if we were asked to load a driver of a
specific name, but it failed to load for some reason. The user really
should be made aware of this, (and instructed to set LIBGL_DEBUG for
more details).
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Sometimes an error is so sever that we want to print it even when the
user hasn't specifically requested debugging by setting LIBGL_DEBUG.
Add a CriticalErrorMessageF macro to be used for this case. (The error
message can still be slienced with the existing LIBGL_DEBUG=quiet).
For critical error messages we also direct the user to set the
LIBGL_DEBUG environment variable for more details.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The description of ErrorMessageF was misleading in the case of
LIBGL_DEBUG being unset, (the previous comment could be understood to
mean the error should be printed, but the code does not print in this
case).
InfoMessageF previously had no comment at all.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The build was broken by the line below, added in commit 4f82fed4.
s_expression.cpp:26: #include <limits>
Mesa's half of the fix is to add 'external/astl/include' to the include
path. The other half of the fix requires implementing
numeric_limits<float>::infinity() in astl, for which I have patches
submitted upstream for review.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Outputs should be treated in the same way as
inputs and temporaries here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
If we had no vertex textures or samplers previously and we have none now,
don't bother doing the enables dance.
I was profiling nexuiz on noop and noticed these two functions in the
profile, this drops their usage from 0.86% to 0.03% and 0.23% to 0.03%
for texture and samplers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were doing saturate-based clamping on the [0,width] or [0,height]
coordinate, which meant only the first pixel was addressable.
Fixes piglit ARB_texture_rectangle/texwrap-RECT-bordercolor
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should be able to merge self-move instruction into the MRF move
anyway, and this simplifies things for the next commit.
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The HiZ op was implemented as a meta-op. This patch reimplements it by
emitting a special HiZ batch. This fixes several known bugs, and likely
a lot of undiscovered ones too.
==== Why the HiZ meta-op needed to die ====
The HiZ op was implemented as a meta-op, which caused lots of trouble. All
other meta-ops occur as a result of some GL call (for example, glClear and
glGenerateMipmap), but the HiZ meta-op was special. It was called in
places that Mesa (in particular, the vbo and swrast modules) did not
expect---and were not prepared for---state changes to occur (for example:
glDraw; glCallList; within glBegin/End blocks; and within
swrast_prepare_render as a result of intel_miptree_map).
In an attempt to work around these unexpected state changes, I added two
hooks in i965:
- A hook for glDraw, located in brw_predraw_resolve_buffers (which is
called in the glDraw path). This hook detected if a predraw resolve
meta-op had occurred, and would hackishly repropagate some GL state
if necessary. This ensured that the meta-op state changes would not
intefere with the vbo module's subsequent execution of glDraw.
- A hook for glBegin, implemented by brwPrepareExecBegin. This hook
resolved all buffers before entering
a glBegin/End block, thus preventing an infinitely recurring call to
vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to
flush its vertex queue in response to GL state changes.
Unfortunately, these hooks were not sufficient. The meta-op state changes
still interacted badly with glPopAttrib (as discovered in bug 44927) and
with swrast rendering (as discovered by debugging gen6's swrast fallback
for glBitmap). I expect there are more undiscovered bugs. Rather than play
whack-a-mole in a minefield, the sane approach is to replace the HiZ
meta-op with something safer.
==== How it was killed ====
This patch consists of several logical components:
1. Rewrite the HiZ op by replacing function gen6_resolve_slice with
gen6_hiz_exec and gen7_hiz_exec. The new functions do not call
a meta-op, but instead manually construct and emit a batch to "draw"
the HiZ op's rectangle primitive. The new functions alter no GL
state.
2. Add fields to brw_context::hiz for the new HiZ op.
3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable.
4. Kill all dead HiZ code:
- the function gen6_resolve_slice
- the dirty flag BRW_NEW_HIZ
- the dead fields in brw_context::hiz
- the state packet manipulation triggered by the now removed
brw_context::hiz::op
- the meta-op workaround in brw_predraw_resolve_buffers (discussed
above)
- the meta-op workaround brwPrepareExecBegin (discussed above)
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reported-by: xunx.fang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927
Reported-by: chao.a.chen@intel.com
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If size is small (such as 1),
pitch = ROUND_DOWN_TO(MIN2(size, (1 << 15) - 1), 4);
makes pitch = 0. Then
height = size / pitch;
causes a division-by-zero exception. If pitch is zero, set height to
1 and avoid the division.
This fixes piglit's bin/getteximage-formats test and glean's
bufferObject test.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44971
There are cases where a buffer can be mapped while another buffer is
flushed. This can happen in the CopyPixels meta-op path for piglit's
fbo-mipmap-copypix. After some discussion with Eric, it seems this
assertion is no longer necessary, and it has always been too strict.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43328
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used by glReadPixels and glDrawPixels. Now those
functions do the corresponding error checks.
Signed-off-by: Brian Paul <brianp@vmware.com>
Basically the same story as the previous commit. But we were
already calling _mesa_source_buffer_exists() in ReadPixels().
Yeah, we were calling it twice.
Signed-off-by: Brian Paul <brianp@vmware.com>
The _mesa_error_check_format_type() function does two things: check
that format/type is legal and check that the destination (or source
buffer for glReadPixels) actually exists. Just move the relevant
parts of that into _mesa_DrawPixels().
We'll do a similar change in glReadPixels then get rid of the function
altogether.
Signed-off-by: Brian Paul <brianp@vmware.com>
This replaces the _mesa_is_legal_format_and_type() function.
According to the spec, some invalid format/type combinations to
glDrawPixels, ReadPixels and glTexImage should generate
GL_INVALID_ENUM but others should generate GL_INVALID_OPERATION.
With the old function we didn't make that distinction and generated
GL_INVALID_ENUM errors instead of GL_INVALID_OPERATION. The new
function returns one of those errors or GL_NO_ERROR.
This will also let us remove some redundant format/type checks in
follow-on commit.
v2: add more checks for ARB_texture_rgb10_a2ui at the top of
_mesa_error_check_format_and_type() per Ian.
Signed-off-by: Brian Paul <brianp@vmware.com>
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Refine 80aa78142d "dri: make sure to build libdricommon.la"
so we don't build libdricommon if we aren't building a dri driver which needs it (i.e.
if we are just building swrast)
In particular, this restores the ability to build the swrast dri driver without having to
have a xf86drm.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
in check_index_bounds the comparison needs to be "greater equal" since
contrary to the name _MaxElement is the count of the array (this matches
similar code in vbo_exec_DrawRangeElementsBaseVertex).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the build of builtin_compiler on my 32-bit build where xcb-dri2
is in a custom prefix but the custom prefix flags weren't available.
It shouldn't have been in LIBS anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This checks for advertised LLC support by the GPU instead of relying on
the GPU generation for detection.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Rely on libdrm HAS_LLC parameter to verify if hardware supports it. In
case the libdrm version does not supports this check, fallback to older
way of detecting it which assumed that GPUs newer than GEN6 have it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Was previously being done in a state-tracker, but in a way which was
difficult for some drivers to optimize. Push down to this level and
make it the individual drivers responsibility.
FBOs differ from textures in a significant way. With textures, we can
strip the border and get correct rendering except when the application
fetches texels outside [0,1].
With an FBO, the pixel at (0,0) is in the border. The
ARB_framebuffer_object spec says:
"If the attached image is a texture image, then the window
coordinates (x[w], y[w]) correspond to the texel (i, j, k), from
figure 3.10 as follows:
i = (x[w] - b)
j = (y[w] - b)
k = (layer - b)
where <b> is the texture image's border width..."
Since the border doesn't exist, we can never render any pixels in the
correct location. Just mark these FBOs FRAMEBUFFER_UNSUPPORTED.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42336
Ever since xserver commit 531869448d07e00ae241120b59f3aaaa5709d59c,
the server no longer sends invalidate events to clients, unless they
have performed a GetBuffers request since the drawable was last
invalidated.
If the drawable gets invalidated immediately after the GetBuffers
request was processed by the X server, it's possible that Xlib
will process the invalidate event while waiting for the GetBuffers
reply. So the server, thinking the client knows that the buffers
are invalid, is waiting for another GetBuffers request before
sending any more invalidate events. The client, on the other hand,
believes the buffers to be valid, and thus is expecting to receive
another invalidate event before it has to send another GetBuffers
request. The end result is that the client never again sends
a GetBuffers request.
To avoid this problem, take a snapshot of the lastStamp before
doing GetBuffers, and retry if the snapshot and the current
lastStamp no longer match after the GetBuffers reply has been
processed.
Signed-off-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The error message I chose matches gcc's error. Fixes piglit
switch-case-duplicated.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise, the upcoming error messages said the location was 0:0(0).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not quite spelled out in the spec text, but the grammar indicates
that only constant values are allowed as switch() case labels (and
only constant values make sense, anyway).
Fixes piglit glsl-1.30/compiler/switch-statement/switch-case-uniform-int.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This stuffs them all in a struct for sanity. Fixes piglit
glsl-1.30/execution/switch/fs-uniform-nested.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For all the extension entrypoints using the get_buffer() helper, they
wanted the same error handling. In some cases, the error was doing
the same error return whether target was a bad enum, or a user buffer
wasn't bound.
(Actually, GL_ARB_map_buffer_range doesn't specify the error for a zero
buffer being bound for MapBufferRange, though it does for
FlushMappedBufferRange. This appears to be an oversight).
Fixes piglit GL_ARB_copy_buffer/negative-bound-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
Even though it should be safe to use them for one frame, better be sure.
Suggested by Michael Dänzer.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
This prevents a possible lapse of the depth buffer - the situation where
the app and pp have different depth buffers.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
In commit 6ecee54a9a a call to
talloc_reference was replaced with a call to talloc_steal. This was in
preparation for moving to ralloc which doesn't support reference
counting.
The justification for talloc_steal within token_list_append in that
commit is that the tokens are being copied already. But the copies are
shallow, so this does not work.
Fortunately, the lifetime of these tokens is easy to understand. A
token list for "replacements" is created and stored in a hash table
when a function-like macro is defined. This list will live until the
macro is #undefed (if ever).
Meanwhile, a shallow copy of the list is created when the macro is
used and the list expanded. This copy is short-lived, so is unsuitable
as a new parent.
So we can just let the original, longer-lived owner continue to own
the underlying objects and things will work.
This fixes bug #45082:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
This test cases exposes a bug as described in this bug report:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Clearly, some memory is getting (incorrectly) freed on the first macro
invocation, leading to problems with the second macro invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The trick here is that flex always chooses the rule that matches the most
text. So with a input text of "two:" which we want to be lexed as an
IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER
rule would match longer as a single token of "two:" which we don't want.
We prevent this by forcing the OTHER pattern to never match any
characters that appear in other constructs, (no letters, numbers, #,
_, whitespace, nor any punctuation that appear in CPP operators).
Fixes bug #44764:
GLSL preprocessor doesn't replace defines ending with ":"
https://bugs.freedesktop.org/show_bug.cgi?id=44764
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
GL_RG_INTEGER only has two components, not three. I'll be surprised
if anyone ever tries to glReadPixels(..., GL_SHORT, GL_RG_INTEGER,
...). This was found by inspection.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With 0963990 the flag was only set when Bind created the object. In
all cases where ::ARBsemantics could be true, this path never
happened. Instead, add a _Used flag to track whether a VAO has ever
been bound. On the first Bind, set the _Used flag, and set the
ARBsemantics flag to the correct value.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45423
This is a hack, and it will result in incorrect rendering. However,
it does eliminate spurious warnings in several piglit CopyPixels tests
that involve floating-point depth buffers.
The real solution is to add a zf field to SWspan to store float Z
values. When a float depth buffer is involved, swrast should also
populate the zf field. I'll consider this post-8.0 work.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the draw module avoids flushing, it may flush precisely when
binding a NULL shader, so care must be taken when restoring the original
fragment shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
When GLAPIENTRY is __stdcall (ie Windows), the stack is popped by the
callee making the number/type of arguments significant, therefore
using a generic no-op causes stack corruption for many entry-points.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1)
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case: max_values
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In certain situations API's will call pipe->clear which doesn't
require fragment shader, but then we'd try to verify the pipeline
and assume fragment shader was always set. This was leading to
crash when API would just call simple clear's before anything else.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The node_attrsz[] array is initially copied from the node->attrsz[]
array but some values get rewritten. Thereafter, we need to use the
node_attrsz[] values.
Fixes a bug when replaying a display list that uses generic vertex
array[16] (at least).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings were:
nv50_pc_regalloc.c: In function ‘pass_generate_phi_movs’:
nv50_pc_regalloc.c:423:41: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp: In member function ‘bool nv50_ir::MemoryOpt::replaceStFromSt(nv50_ir::Instruction*, nv50_ir::MemoryOpt::Record*)’:
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
And add some assertions to catch this sooner in debug builds.
This fixes a dangling texture object pointer bug hit via wglShareLists().
When we push the GL_TEXTURE_BIT state we may push references to the default
texture objects which are owned by the gl_shared_state object. We don't
want to accidentally delete that shared state while the attribute stack
references shared objects. So keep a reference to it.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This cleans up the reference counting of shared context state.
The next patch will use this to fix an actual bug.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also significantly improves the RV670 flush by using the CB1 flush
*always* and also DEST_BASE_0_ENA, which appears to magically fix some tests.
I am not entirely sure, but it's possible that RV670 flushing is fixed
completely.
v2: fix cayman by flushing texture cache instead of vertex cache
Thanks to Dave Airlie for testing Cayman.
Commit 99476561 (automake: src/glsl and src/glsl/glcpp) changed the
build system so that src/glsl/glsl_test is not built by default. This
inadvertently broke "make check", since the tests in
src/glsl/tests/lower_jumps (which are run by "make check") rely on
glsl_test.
This patch ensures that "make check" builds glsl_test before running
any tests.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes these GCC warnings.
osmesa.c: In function ‘osmesa_renderbuffer_storage’:
osmesa.c:417: warning: comparison is always false due to limited range of data type
osmesa.c:423: warning: comparison is always false due to limited range of data type
osmesa.c:431: warning: comparison is always false due to limited range of data type
osmesa.c:437: warning: comparison is always false due to limited range of data type
osmesa.c:447: warning: comparison is always false due to limited range of data type
osmesa.c:453: warning: comparison is always false due to limited range of data type
osmesa.c:463: warning: comparison is always false due to limited range of data type
osmesa.c:466: warning: comparison is always false due to limited range of data type
osmesa.c:476: warning: comparison is always false due to limited range of data type
osmesa.c:479: warning: comparison is always false due to limited range of data type
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Success was (tests-passed AND valgrind-tests-passed) but this meant that
if the valgrind tests weren't run it would be considered a failure.
The logic is now (tests-passed AND (!valgrind OR valgrind-tests-passed))
which lets us return success if the valgrind tests aren't run.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Needed for automake. Using AC_PROG_PATH(bison/flex) causes automake to
fail to build .y and .l files.
It is up to the builder to use bison/flex instead of yacc/lex.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Exporting a publicly visible class with a generic name like
"variable_entry" via ir_variable_refcount.h is kind of mean.
Many IR transformers would like to define their own "variable_entry"
class. If they accidentally include this header, the compiler/linker
may get confused and try to instantiate the wrong variable_entry class,
leading to bizarre runtime crashes.
The hope is that renaming this one will allow .cpp files to safely
declare and use their own file-scope "variable_entry" classes.
This avoids crashes caused by converting src/glsl to automake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This uses point size clamping to force point size to a particular value,
making the vertex shader output irrelevant.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't set the other bits anywhere else except the other DSA states,
which are mutually-exclusive with this one.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the gl_PointSize transform feedback test.
Point size clamping should happen at the rasterizer stage,
i.e. after the vertex and geometry shaders and transform feedback.
Drivers are expected to do this by themselves.
Simplifies the general case code in the ubyte-valued texture format
functions. More consolidation to come in subsequent commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifially, this being present works around a bug in Unigine
Sanctuary on i965 which previously resulted in bad rendering.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can be used to work around broken application behavior, like in
Unigine where it attempts to use texture arrays without declaring
either "#extension GL_EXT_texture_array : enable" or "#version 130".
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While typing out the new decode, I added a fallback mode for dumping
when we fail to re-map the BO after execution. This should get us a
minimal dump when trying to dump a batch that results in a GPU hang.
We were allocating registers into the MRF hack region, resulting in
sparkly renering in a few of the scenes. We could do better
allocation by making an MRF class, having MRFs conflict with the
corresponding GRFs, and tracking the live intervals of the "MRF"s and
setting up the conflicts. But this is way easier for the moment.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After the removal of the dri driver link test, this should help avoid
the original problem that it was designed to catch: The warning about
a missing prototype due to typoing a function name scrolling by in the
Mesa build spew, and you not noticing until you try to run an
application and it falls back to swrast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The envvar works for R100 and R200 too, and the classic R300 driver
doesn't even exist anymore.
"RADEON_NO_TCL" is already mentioned in the code and is the same envvar
used for the R300g driver.
lp_bld_tgsi_soa.c has been adapted to use this new interface, but
lp_bld_tgsi_aos.c has only been partially adapted, since nothing in
gallium currently uses it.
v2:
- Rename lp_bld_tgsi_action.[ch] => lp_bld_tgsi_action.[ch]
- Initialize tgsi_info in lp_bld_tgsi_aos.c
- Fix copyright dates
Prior commit 576161289d,
the parameter format was bpp, thus both 24bit and 32bit formats were
requested with format set to 32. Handle 24bit seperately now.
Fixes RGBX formats in wayland platform for egl_dri2 (EGL_ALPHA_SIZE=0).
Note: This is a candidate for the 8.0 branch.
This just copies what the LUMINANCE_ALPHA bits do.
Fixes piglit tests on softpipe complaining about missing unpack.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs some of the MUL instructions spread across a full slot
of vectors.
It also no longer has RECIP_UINT, the recommendation is to replace it
with a U2F + RECIP_IEEE + MUL + F2U.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The warning is absolutely useless. It doesn't actually say that there are
uninitialized variables. It points out the fact that there are missing
initializers and that variables are initialized to zero implicitly, which is
exactly what we want and what we commonly make use of.
C90 and C99 require all unspecified variables in the initializer list to be set
to zero.
The check for ctx->API was unnecessary, because OES extensions are not exposed
in desktop GL.
Also require renderbuffer support for ARB_texture_rgb10_a2ui,
as per the spec.
Tested by comparing old and new glxinfo with softpipe and r600g.
v2: fix bugs
v3: rename need_only_one -> need_at_least_one
rename num_elements -> num_mappings
add comments
use const when appropriate
Reviewed-by: Brian Paul <brianp@vmware.com>
This change is not exactly equivalent (sometimes we checked for non-zero,
sometimes if >0 or >1), but the behavior shouldn't change, because all drivers
report 0 for unsupported CAPs.
Exposing CAP_STREAM_OUTPUT_PAUSE_RESUME without CAP_MAX_STREAM_OUTPUT_BUFFERS
is a driver bug and st/mesa does no checking if the latter is supported as
well. Drivers must report CAPs consistently.
v2: make the array const
v2: handle the cap in r300 and r600 as well
Additional info for r600g:
The env var R600_GLSL130=1 enables GLSL 1.3.
Along with R600_STREAMOUT=1, it enables full GL 3.
Fix an access to uninitialized memory pointed out by valgrind in
glsl_to_tgsi_visitor::simplify_cmp(void).
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Fix this GCC warning.
draw_pipe_clip.c: In function ‘interp’:
draw_pipe_clip.c:122:13: warning: variable ‘clip_dist’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When rendering to FBO, rendering is inverted. At the same time, we would
also make sure the point sprite origin is inverted. Or, we will get an
inverted result correspoinding to rendering to the default winsys FBO.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44613
NOTE: This is a candidate for stable release branches.
v2: add the simliar logic to ivb, too (comments from Ian)
simplify the logic operation (comments from Brian)
v3: pick a better comment from Eric
use != for the logic instead of ^ (comments from Ian)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code quite a bit, consolidates some cases and
possibly catches more cases for the memcpy path.
More such changes will follow. Do just a few at a time to help bisect
any possible regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use memcpy in more situations. We can also remove
the checks for byte spapping that happen before the calls to
_mesa_format_matches_format_and_type().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In a recent commit,
commit 1c0f1dd42a
Author: Chad Versace <chad.versace@linux.intel.com>
swrast: Fix fixed-function fragment processing
I defined a new function,_swrast_fragment_program, but neglected
to #include s_fragprog.h for clients of that function.
Note: This is a candidate for the 8.0 branch.
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The evergreen+ CB no longer supports the following formats
compared to 6xx/7xx:
- COLOR_4_4
- COLOR_3_3_2
- COLOR_6_5_5
- COLOR_8_24_FLOAT
- COLOR_24_8_FLOAT
- COLOR_11_11_10
- COLOR_11_11_10_FLOAT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On i965, _mesa_ir_link_shader is never called. As a consequence, the
current fragment program (ctx->FragmentProgram->_Current) exists but is
invalid because it has no instructions. Yet swrast continued to attempt to
use the empty program.
To avoid using the empty program, this patch 1) defines a new function,
_swrast_use_fragment_program, which checks if the current fragment program
exists and differs from the fixed function fragment program, and, when
appropriate, 2) replaces checks of the form
if (ctx->FragmentProgram->_Current == NULL)
with
if (_swrast_use_fragment_program(ctx))
Fixes the following oglconform regressions on i965/gen6:
api-fogcoord(basic.allCases.log)
api-mtexcoord(basic.allCases.log)
api-seccolor(basic.allCases.log)
api-texcoord(basic.allCases.log)
blend-separate(basic.allCases)
colorsum(basic.allCases.log)
The tests were ran with the GLXFBConfig:
visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav
id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat
----------------------------------------------------------------------------
0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None
(Note: I originally believed that the hunk in
_swrast_update_fragment_program was unnecessary. But it is required to fix
blend-separate.)
Note: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reveiwed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Color clamping should be enabled in glGetTexImage if texture dataType is
GL_UNSIGNED_NORMALIZED and format is GL_LUMINANCE or GL_LUMINANCE_ALPHA
Fixes 2 Intel oglconform test cases: pxconv-gettex and pxtrans-gettex
https://bugs.freedesktop.org/show_bug.cgi?id=40864
NOTE: This is a candidate for the 8.0 branch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was losing bits of precision. Fixes (with the previous commits):
piglit EXT_texture_integer/getteximage-clamping
piglit EXT_texture_integer/getteximage-clamping GL_ARB_texture_rg
oglc advanced.mipmap.upload
Regresses oglc negative.typeFormatMismatch.teximage from fail to
abort, because it's been hitting texstore for a format/type combo that
shouldn't happen.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the core, we always treat spans of int/uint data as uint, so this
extract function was truncating storage of integer pixel data to a n
int texture to (0, max_int) instead of (min_int, max_int). There is
probably missing code for handling truncation on conversion between
pixel formats, still, but this does improve things.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mostly fixes piglit EXT_texture_integer/getteximage-clamping. The
remaining failure involves precision loss on storing of int32 texture
data (something I knew was an issue, but wasn't trying to test).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This cut and paste is pretty awful. I'm tempted to do a lot of this
using preprocessor tricks for customizing the parameter type from a
template function, but that's just a different sort of hideous.
Fixes 8 Intel oglconform int-textures cases.
NOTE: This is a candidate for the 8.0 branch.
v2: Add alpha formats, too.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, when you asked for the _BaseFormat of an rb wrapping a
GL_RGB texture, you got GL_RGBA because that's what we were storing
the texture data as.
NOTE: This is a candidate for the 8.0 branch.
Most of this function was just calling
intel_renderbuffer_update_wrapper(), which was called immediately
afterwards in the only caller.
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit ARB_copy_buffer-overlap, on swrast, which previously
assertion failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A pure swrast-allocated buffer gets an irb of NULL, so we segfaulted
in the clear-accum test. Just look at the swrast renderbuffer pointer
for handling swrast rbs.
From the extension spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., RenderbufferStorageMultisampleEXT..."
Fixes piglit EXT_framebuffer_multisample/dlist.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed when handling a similar problem in EXT_framebuffer_multisample.
From the EXT_framebuffer_object spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., GenFramebuffersEXT, BindFramebufferEXT,
DeleteFramebuffersEXT, CheckFramebufferStatusEXT,
GenRenderbuffersEXT, BindRenderbufferEXT, DeleteRenderbuffersEXT,
RenderbufferStorageEXT, FramebufferTexture1DEXT,
FramebufferTexture2DEXT, FramebufferTexture3DEXT,
FramebufferRenderbufferEXT, GenerateMipmapEXT..."
Reviewed-by: Brian Paul <brianp@vmware.com>
Constants array is always assumed to be RGBA, which means we need to
swizzle the constant elements into place to match the AoS ordering
(e.g., BGRA) that was passed to lp_build_tgsi_aos().
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Should avoid dangling pointer derreference with
glean --run results --overwrite --quick --tests texSwizzle
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
We just prefix the $CLANG environment variable in configure.ac with acv_mesa_
Found by: tinderbox
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was horribly broken and has cost everyone more time than we were
ever going to save using it. It might have been fixable, but the
problem it was originally trying to solve can be better solved with
-Werror=missing-prototypes and -Werror=implicit-function-declaration.
Also, it was always producing a big scary warning about how the link
test was non-portable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44928
Substantially increases performance in GLBenchmark PRO:
- 320x240 => 3.28x
- 1920x1080 => 1.47x
- 2560x1440 => 1.27x
The LD message ignores the sampler unit index and SAMPLER_STATE pointer,
instead relying on hard-wired default state. Thus, there's no need to
worry about running out of sampler units or providing SAMPLER_STATE;
this small patch should be all that's required.
NOTE: This is a candidate for release branches.
(It requires the preceding commit to compile.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_SAMPLE is full of complex workarounds for original Broadwater
hardware, and I'd rather avoid all that for my next Ivybridge patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function releases the buffer that contains user-space vertex data.
The buffer_offset field points into that buffer. So reset the
buffer_offset to zero when we release the buffer so that subsequent
draws don't inadvertantly get a bad offset.
Fixes error messages / failed assertions (in the draw module's bounds/size
checking code) when running piglit's polygon-mode test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
-fvisibility=hidden was preventing them from being exported, which
combined with shared-glapi was causing undefined symbol errors at
runtime.
We don't want to make these functions part of the ABI, and given
how simple they are, we simply inline them.
From c998f732d42da5e962fe5da294493132c3e8dc5f Mon Sep 17 00:00:00 2001
From: Lucas Stach <dev@lynxeye.de>
Date: Tue, 24 Jan 2012 09:46:32 +0100
Subject: [PATCH] nvfx: fix nv3x fallout from state validation changes
Apparently nv3x needs some curde hacks to work properly. This
is clearly not the right fix, but it's the behaviour of the old
code and fixes regressions seen by users.
This has the drawback that when creating configure for
distribution, wayland needs to be available for the packager.
Also the the macros has the wayland prefix hardcoded, so
we cant copy it in mesa right now.
In bad applications like ipers which does a lot of draw calls with
no state changes this helps to greatly reduce time spent in prepare.
In ipers around 7% of CPU was spent in various prepare functions,
after this commit no prepare function show on the profile.
This commit also has the added benefit of now grouping all pipelined
drawing into a single draw call if the driver uses vbuf_render.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Previously, max_vs_entries was set to 128 for GT1, and 256 for GT2,
based on the PRM (see Vol2, part1, p28). However, Bspec section 1.6.5
indicates that the maximum number of VS entries is 256 for GT1.
No piglit regressions on GT1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When storing data in a buffer of type DYNAMIC_DRAW, we don't create a
drm_intel_bo for it; instead we store the data in system memory and
defer allocation of the GPU buffer until it is needed. Therefore, in
brw_update_sol_surface(), we can't just consult the "buffer" field of
the intel_buffer_object structure; we need to call
intel_bufferobj_buffer() to ensure that the deferred allocation
occurs.
This parallels a similar fix for gen7 (see commit ba6f4c9).
Fixes piglit test EXT_transform_feedback/buffer-usage on gen6.
This is a candidate for the 8.0 release branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
It always had the same value as ctx->Extensions.EXT_framebuffer_sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Strictly speaking, it's not legal to expose EXT_texture_integer without
EXT_gpu_shader4. It might be even dangerous (apps can assume EXT_gpu_shader4
is available without checking for it).
The check in compute_version is removed as well, because that's already
covered by GLSLVersion >= 130.
Reviewed-by: Brian Paul <brianp@vmware.com>
- use OR to combine bind flags
- combine both conditionals into one
- move the ARB_fbo enable where it belongs
Reviewed-by: Brian Paul <brianp@vmware.com>
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
A current incomplete framebuffer was incorrectly used as a
st_framebuffer. When accessing st_framebuffer childs bad things happen:
e.g. st_framebuffer::iface was used to check whether its an incomplete
fb, instead we need to compare st_framebuffer::Base against
mesa_get_incomplete_framebuffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44919
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes Intel oglconform negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing Intel oglconform
negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is unprepared for handling integer (particularly, the
baseFormat of the TexFormat comes out as GL_RGBA, not GL_RGBA_INTEGER,
so the direct call of Driver.ReadPixels crashes due to the int vs
non-int error checking not having happened). I'm frankly tempted to
convert this code to MapRenderbuffer/MapTexImage rather than doing it
as meta ops, now that we have that support.
Improves the remaining crash in Intel oglconform for int-textures to
just a rendering failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This aborts and crashes in intel oglconform's int-textures into being
just rendering failures. Clamping isn't handled yet.
v2: Add missing "break".
v3: Drop the int/uint distinction, since they don't need different clamping.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Similarly to how we handle this in texstore, we have to remap height
to depth so that we MapTextureImage each image layer individually.
Fixes part of Intel oglconform's int-textures advanced.fbo.rtt
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a step toward fixing Intel oglconform's
int-textures advanced.fbo.rtt.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This doesn't result in correct rendering -- GL requires that logic ops
work, while the hardware specs say it doesn't do them. I'm not sure
how we would want to handle this.
NOTE: This is a candidate for the 8.0 branch.
When we're actually rendering into a texture, map the texture image
instead of the corresponding renderbuffer. Before, we just copied
a pointer from the texture image to the renderbuffer. This change
will make the code usable by hardware drivers.
ctx->Driver.MapTexture() always points to _swrast_map_texture().
We're already reaching into swrast from t_vb_program.c anyway.
This will let us remove the ctx->Driver.Map/UnmapTexture() functions.
These are temporary, actually, but they'll make follow-on work easier to
implement in a step-by-step manner. Eventually the Map and RowStrideBytes
fields will go into a new swrast_renderbuffer type, but adding that type
now would involve touching a _lot_ of code that'll eventually be removed.
The fields marked as obsolete will go away completely at some point.
That field is only used by swrast code so there's no reason to mess
with it in the gallium state tracker.
This also lets us remove the unused st_format_data() type function and
related code.
When ARB VAOs are used, glPopClientAttrib does not resurrect a deleted
VAO or VBO. This difference between the two spec is, unfortunately,
not very well spelled out in the specs.
Fixes oglc vao(advanced.pushPop.deleteVAO) and
vao(advanced.pushPop.deleteVBO) tests.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There are more differences between Apple and ARB than just requiring
that all arrays be stored in VBOs. Additional uses will be added in
following commits.
Also, set the flag at Bind time instead of Gen time. The ARB_vao spec
specifies that behavior.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a hack to work around drivers such as i965 that:
- Set _MaintainTexEnvProgram to generate GLSL IR for
fixed-function fragment processing.
- Don't call _mesa_ir_link_shader to generate Mesa IR from the
GLSL IR.
- May use swrast to handle glDrawPixels.
Since _mesa_ir_link_shader is never called, there is no Mesa IR to
execute. Instead do regular fixed-function processing.
Even on platforms that don't need this, the software fixed-function
code is much faster than the software shader code.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44749
At least one place, the _mesa_need_secondary_color function in
state.h, uses this to make decisions. The next patch in this series
will add another dependency. Ideally, this field would go away and be
replace by a flag or something.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When rowstride was negatie, unsigned promotion caused a segfault here:
299│ if (rb->Format == MESA_FORMAT_S8) {
300│ const GLuint rowStride = rb->RowStride;
301│ for (i = 0; i < count; i++) {
302│ if (x[i] >= 0 && y[i] >= 0 && x[i] < w && y[i] < h) {
303├> stencil[i] = *(map + y[i] * rowStride + x[i]);
304│ }
305│ }
306│ }
Fixes segfault in oglconform
separatestencil-neu(NonPolygon.BothFacesBitmapCoreAPI),
though test still fails.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
i965 processes assignments of whole structures using
vec4_visitor::emit_block_move, a recursive function which visits each
element of a structure or array (to arbitrary nesting depth) and
copies it from the source to the destination. Then it increments the
source and destination register numbers so that further recursive
invocations will copy the rest of the structure. In addition, it sets
the swizzle field for the source register to an appropriate value of
swizzle_for_size(...) for the size of each element being copied, so
that later optimization passes won't be fooled into thinking that
unused vector elements are live.
This all works fine. However, emit_block_move also contains an
assertion to verify, before setting the swizzle field for the source
register, that the source register doesn't already contain a
nontrivial swizzle. The intention is to make sure that the caller of
emit_block_move hasn't already done some swizzling of the data before
the call, which emit_block_move would then counteract when it
overwrites the swizzle field. But the assertion is at the lowest
level of nesting of emit_block_move, which means that after the first
element is copied, instead of checking the swizzle field set by the
caller, it checks the swizzle field used when moving the previous
element. That means that if the structure contains elements of
different vector sizes (which therefore require different swizzles),
the assertion will erroneously fire.
This patch moves the assertion from emit_block_move to the calling
function, vec4_visitor::visit(ir_assignment *). Since the caller is
non-recursive, the assertion will only happen once, and won't be
fooled by emit_block_move's modification of the swizzle field.
This patch also reverts commit fe006a7 (i965/vs: Fix swizzle related
assertion), which attempted to fix the bug by making the assertion
more lenient, but only worked properly for structures, arrays, and
matrices in which each constituent vector is the same size.
This fixes the problem described in comment 9 of
https://bugs.freedesktop.org/show_bug.cgi?id=40865. Unfortunately, it
doesn't fix the whole bug, since the test in question is also failing
due to lack of register spilling support in the VS.
Fixes piglit test vs-assign-varied-struct. No piglit regressions on
Sandy Bridge.
This is a candidate for the 8.0 release branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40865#c9
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to a commit that did the same for the FS.
Shaves several more instructions off of the VS in Lightsmark, but no
statistically significant performance difference (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaves a few instructions off of the VS in Lightsmark, but no
statistically significant performance difference on gen7 (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AC_CHECK_LIB has this nasty behavior, like the cflags tests, of
automatically putting the tested value into the global LIBS on
success. This caused -lexpat to end up in LIBS, but without the
--with-expat dir, so my 32-bit build on a 64 system using expat from a
custom prefix could only find the system expat and fail to link on the
one current consumer of the LIBS variable: the dri driver test link.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While reading through the simulator, I found some interesting code that
looks like it checks the sampler default color pointer against the bound
set in STATE_BASE_ADDRESS. On failure, it appears to program it to the
base address itself.
So I decided to try programming a legitimate bound, and lo and behold,
border color worked.
+92 piglits on Sandybridge. Also fixes Lightsmark on Ivybridge.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we now always build shared glapi, this exposes the fact that libOSMesa was
underlinked when glapi was built shared.
Fix this by doing the same thing as drivers/X11/Makefile already does, ensuring
that the library is linked with the shared glapi library.
(I'm not clear why we link with both glapi.a and glapi.so, so this may be all wrong)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Refine "always build shared dricore" so we don't build it if we don't need
it because we aren't actually building any dri drivers because of --disable-driglx-direct
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looks insane, but it does appear we need a full slot per input/output.
This fixes another 180 or so piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adds all the easier lowhanging opcodes.
Fixes ~3000 piglit tests with GLSL1.30 enabled on cayman.
This just leaves the mul/div/mod ops to fix up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
"If set, forces degamma on XYZ if format is
FMT_8_8_8_8, FMT_BC1, FMT_BC2, or FMT_BC3"
Don't claim support for sRGB on any other formts.
This fixes glean texture_srgb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't pass the piglit test, but it seems to be a lot closer
than it was before. I need to track down if there is another problem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Due to the changes for multiple kcache banks support, now we are assigning
final SRCx_SEL values for kcache access at the later stage, when building the
bytecode. So we need to take into account kcache banks to distinguish
the constants with the same address but different bank index.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Same fix as previously done by Dave Airlie for r600/r700
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Clip planes are uploaded as a constant buffer and used by the vertex
shader to produce corresponding clip distances for hw clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for multiple kcache banks (constant buffers).
Lock the required lines only.
Allow up to 4 kcache line sets in the alu clause by using ALU_EXTENDED on eg+.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix this GCC warning on non-debug builds.
glsl_types.cpp: In member function 'gl_texture_index
glsl_type::sampler_index() const':
glsl_types.cpp:157: warning: control reaches end of non-void function
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable it in the evergreen_context_draw if needed.
Same as already done in the r600_context_draw for r6xx/r7xx.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BURST_COUNT is clipped with ARRAY_SIZE, so set it to the max value
to avoid clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
libglapi.so, libGL.so, libGLESv2.so, libGLESv1_CM.so must all
come from the same version of Mesa or bad things may happen.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
When the framebuffer has separate depth and stencil buffers, and HiZ is
not enabled on the depth buffer, mark the framebuffer as unsupported. This
happens when trying to create a framebuffer with Z16/S8 because we haven't
enabled HiZ on Z16 yet.
Fixes gles2conform test stencil8.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44948
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed--by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This loosens the format validation in glBlitFramebuffer. When blitting
depth bits, don't require an exact match between the depth formats; only
require that the two formats have the same number of depth bits and the
same depth datatype (float vs uint). Ditto for stencil.
Between S8_Z24 buffers, the EXT_framebuffer_blit spec allows
glBlitFramebuffer to blit the depth and stencil bits separately. So I see
no reason to prevent blitting the depth bits between X8_Z24 and S8_Z24 or
the stencil bits between S8 and S8_Z24. However, we of course don't want
to allow blitting from Z32 to Z32_FLOAT.
Fixes Piglit fbo/fbo-blit-d24s8 on Intel drivers with separate stencil
enabled.
The problem was that, on Intel drivers with separate stencil, the default
framebuffer has separate depth and stencil buffers with formats X8_Z24 and
S8. The test attempts to blit the depth bits from a S8_Z24 buffer into the
default framebuffer.
v2: Check that depth datatypes match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44665
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The nvc0 gallium driver is advertising 128 MAX_INTERLEAVED_COMPS
which made it always assert in the linker when TFB was used since
the Outputs array was smaller than that maximum.
v2: added assertions
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
So it appears R600s (except rv670) do AR handling different using a different
opcode. This patch fixes up r600g to work properly on r600.
This fixes ~100 piglit tests here (in GLSL1.30 mode) on rv610.
v3: add index_mode as per the docs.
This still fails any dst relative tests for some reason I can't quite see yet,
but it passes a lot more tests than without.
v4: add a nop after dst.rel this could be improved using a second pass,
where we only insert nops if two instructions are sure to collide.
The docs say r600, rv610, rv630 needs this, and not rv670, rs780, rs880,
need AMD to confirm rv620, rv635.
v5: add is_nop_inst.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use a bitmask approach to compute gl_array_object::_MaxElement.
To make this work correctly depending on the shader type actually used,
make use of the newly introduced typed bitmask getters.
With this change I gain about 5% draw time on some osgviewer examples.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Depending on the installed shader type, different arrays are used
from gl_array_object. Provide helper functions that compute
the bitmask of these arrays that are finally enabled for a given
shader type. The will be used in a followup change.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit ede60bc467 (glsl: Add isinf() and
isnan() builtins) uses "+INF" in the .ir file to represent infinity.
This worked on C99-compliant compilers, since the s-expression reader
uses strtod() to read numbers, and C99 requires strtod() to understand
"+INF". However, it didn't work on non-C99-compliant compilers such
as MSVC.
This patch modifies the s-expression reader to explicitly check for
"+INF" rather than relying on strtod() to support it.
This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44767
Tested-by: Morgan Armand <morgan.devel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To fix failed assertions when calling glCopyBufferSubData().
svga_texture() asserts that the resource is a texture. Simply move the
calls to svga_texture() after the code that handles non-texture copies
so that we don't call it with non-texture resources.
Fixes glean bufferObject failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Two assignments to num_immediates were missing in
get_pixel_transfer_visitor() and get_bitmap_visitor().
The uninitialized value led to valgrind errors and crashes in some
cases.
Added new assertions to catch future problems in this area. Also
changed num_immediates to unsigned to avoid signed/unsigned
comparison warnings.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The default access flags for OpenGL ES (via GL_OES_map_buffer) and
desktop OpenGL are different. The code previously tried to handle
this, but the decision was made at compile time. Since the same
driver binary can be used for both OpenGL ES and desktop OpenGL, the
decision must be made at run-time.
This should fix bug #44433. It appears that the test case does
various map and unmap operations and inspects the state of the buffer
object around each. When it sees that GL_BUFFER_ACCESS does not match
its expectations, it fails.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44433
If we don't find an exact PIPE_FORMAT_x for a GL_(COMPRESSED)_RED/RG format,
try uncompressed formats. We were already doing this for the RGB(A) formats.
Fixes piglit arb_texture_compression-internal-format-query test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
msg_type moved by a bit, so the message type was being disassembled
incorrectly. In particular, render target writes were showing up as
"OWORD block write".
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Compared to sampler_gen5, simd_mode shifted by a bit and msg_type grew
by a bit. So we were printing slightly incorrect numbers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Both the VF and VS share space in the URB. First, the VF stores
attributes (shader inputs) there. The VS then reads the attributes,
executes, and reuses the space to store varyings (shader outputs).
Thus, we need to calculate the amount of URB space necessary for inputs,
outputs, and pick whichever is greater.
The old VS backend correctly did this (brw_vs_emit.c:408), but the new
VS backend only considered outputs.
Fixes vertex scrambling in GLBenchmark PRO on Ivybridge.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41318
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the following scenario:
- CreateContext C1
- MakeCurrent C1
- DestroyContext C1 (does not actually destroy the first context, postponed
until the next MakeCurrent)
- CreateContext C2
- MakeCurrent C2
MakeCurrent will call flush on a half destroyed context, leading to crashes.
Since the other paths (destroy and makecurrent) already flush the context,
there is no need to flush here, so we remove this useless flush front call.
This fixes GPU crashes with Chrome and gallium drivers.
v2: Don't flag the format as being HiZ ready (there's DRI2 handshake
pain to go through).
Fixes piglit gl-3.0-required-sized-texture-formats
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is required for Z16 support for texturing, which is the first
thing to have a horizontal alignment of 8. Renderbuffers don't need
it, since they're always set up as the only mip level, but do it for
completeness anyway.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This field is actually set up above.
NOTE: This is a candidate for the 8.0 branch, to avoid conflicts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I copy-and-pasted the thing I was allocating for as the context, so
the first time it would be NULL (root of a ralloc context) and they'd
chain off each other from then on.
NOTE: This is a candidate for the 8.0 branch.
The legal range for the device is apparently [-16.0, +15.0].
Limiting the range to [-15, +15] fixes piglit's lodbias test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The interaction between the mipmap lod min/max limits and the texture
base/max level limits is kind of tricky. Changing the base level
didn't work as expected before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This makes lod clamping more consistent with other drivers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the dd.h docs to indicate that GL_MAP_INVALIDATE_RANGE_BIT
can be used with GL_MAP_WRITE_BIT when mapping renderbuffers and
texture images.
Pass the flag when mapping texture images for glTexImage, glTexSubImage,
etc. It's up to drivers whether to actually make use of the flag.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To try to use less tex memory and maybe get better performance.
Spotted by Roland Scheidegger.
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i965 driver advertises GL_ARB_texture_float and GL_ARB_texture_rg
support but the ctx->TextureFormatSupported[] table entries for
MESA_FORMAT_R_FLOAT32 and MESA_FORMAT_RGBA_FLOAT32 are false on gen 4
hardware. So the case for GL_R32F would fail and we'd print an
implementation error.
This patch adds more Mesa tex format options for GL_R32F and other R/G
formats so we fall back to 16-bit formats when 32-bit formats aren't
available.
Eric made the same fix in commit 6216a5b4 for the non R/G formats.
v2: try 16-bit formats before 32-bit formats and try RG formats before
RGBA where possible.
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=44039
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This enables linear gradients if we need a linear,
it also sets the flat shade flag for color/constant interpolations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When I originally implemented the hack to use GRFs 111+ as fake MRFs, I
did so purely to avoid rewriting all the code that dealt with MRFs.
However, it turns out that a similar hack is actually required.
Newly discovered language in the BSpec indicates that SEND instructions
with EOT set "should" use g112-g127 as their source registers. Based on
assertions in the simulator, this is actually a requirement on certain
platforms.
Since we're faking MRFs already, we may as well use the officially
sanctioned range. My guess is that we avoided this issue because we
seldom use m0: URB writes in the new VS backend start at m1, and RT
writes in the new FS backend start at m2.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we no longer generate Mesa IR from GLSL IR, it's impossible to
use the old vertex shader backend for GLSL programs. There's simply no
Mesa IR to codegen from.
Any attempt to do so would result in immediate GPU hangs, presumably due
to the driver uploading an empty program with no EOT message.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to Table 6.8 (Page 348) in the OpenGL 3.0 specification,
glGetVertexAttribiv supports GL_VERTEX_ATTRIB_ARRAY_INTEGER.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the following OGLConform tests on gen5:
depth-stencil(misc.state_on.depth_int)
fbo_db_ARBfp(basic.OnlyDepthBuffDrawBufferRender)
The problem was that, if the depth buffer's Mesa format was X8_Z24, then
we emitted the hardware format D24_UNORM_X8. But, on gen5, D24_UNORM_S8
must be emitted.
This bug was introduced by:
commit d84a180417
Author: Eric Anholt <eric@anholt.net>
i965: Base HW depth format setup based on MESA_FORMAT, not bpp.
v2: Deref 'intel' directly. Move the branch for newer chipset to top.
Quote the PRM. As requested by Ken.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43408
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The original R600 requires the UNCACHED_FIRST_INST bit
to be set in the PS.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Note: this is candidate for the stable branches.
With the conversion to automake in commit
e326480e4e, several additional build
artifacts are created:
src/mesa/drivers/dri/i965/.deps/
src/mesa/drivers/dri/i965/.libs/
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/i965/Makefile.in
src/mesa/drivers/dri/i965/i965_dri.la
src/mesa/drivers/dri/i965/i965_symbols_test
This patch adds all of these files to .gitignore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TestMipMaps() function in src/OGLconform/textureNPOT.c calls glTexImage2D()
with width = 0. Texture with zero size skips miptree allocation due to a
condition in function _mesa_store_teximage3d(). While calling glGetTexImage()
it results in assertion failure in intel_map_texture_image() due to null mt
pointer.
This patch fixes the issue by detecting the zero size texture early in
glGetTexImage and glGetCompressedTexImage functions. In such a case function
simply returns doing nothing.
Verified that below mentioned bug is fixed by this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=42334
NOTE: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This does introduce a warning by the automake build system, that the
missing-symbols test build is non-portable. That's true -- Mac OS X
can't take something built as a loadable module and just link it as a
library. Of course, we aren't building this on OS X at all, so it
would be nice to be able to suppress it, but I haven't found a way.
Still, the build is going to be much quieter than we have ever had
before, so I think this is a fair tradeoff until we find a way to shut
that warning up.
v2: Put a link in /lib to avoid transition pains for people.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Nothing works if HiZ is enabled and the DDX is incapable of HiZ (that is,
the DDX version is < 2.16).
The problem is that the refactoring that eliminated
intel_renderbuffer::stencil_rb broke the recovery path in
intel_verify_dri2_has_hiz(). Specifically, it broke line
intel_context.c:1445, which allocates the region for
DRI_BUFFER_DEPTH_STENCIL. That allocation was creating a separate stencil
miptree, despite the buffer being a packed depthstencil buffer. Havoc
ensued.
This patch introduces a bool flag that prevents allocation of that stencil
miptree.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44103
Tested-by: Ian Romanick <idr@freedesktop.org>
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fix this GCC warning with non-LLVM builds.
sp_screen.c: In function ‘softpipe_get_shader_param’:
sp_screen.c:141:28: warning: unused variable ‘sp_screen’ [-Wunused-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Calling glXSwapBuffers with no bound context causes segmentation
fault in function intelDRI2Flush. All the gl calls should be
ignored after setting the current context to null. So the contents
of framebuffer stay unchanged. But the driver should not seg fault.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44614
Reported-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Yi Sun <yi.sun@intel.com>
Fix this GCC warning.
lp_test_round.c: In function ‘test_round’:
lp_test_round.c:126:13: warning: variable ‘packed’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fix this GCC 4.6 warning with 64-bit builds.
u_debug_stack.c: In function ‘debug_backtrace_capture’:
u_debug_stack.c:45:17: warning: variable ‘frame_pointer’ set but not
used [-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i915 GPU can't do A8 dst, so we abuse GREEN8 buffers for that
purpose. However, things get hairy as we start to do blending,
because then GL_DST_*_ALPHA should be replaced with GL_DST_*_COLOR.
This is what we do here.
Fixes piglt fbo-alpha.
v2: select the colors in the pixel shader
v3: fix rs state creation for pre-evergreen
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Create the video buffers in the format the driver preffers.
This temporary creates problems with decoder less VDPAU video playback.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Add a second extened constructor that takes plane
textures for the video buffer. Also provide a
function for texture templates.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This requires GLSL 1.30 enabled, which requires integer types enabled,
so don't bother doing an INT to FLT conversion on it.
We should probably remove the instance id flt->int conversion when
turning on native integers.
this passes the three piglit tests with GLSL 1.30 forced on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit rewrites a lot of the state_fb code to support
rendering to targets not aligned to 64 byte.
This allows us to drop the render temporaries as unaligned
targets are the only use-case where they are really needed. The
temporaries code was used for a lot of things more, but apparently
those also work without temps.
There is one regression in piglit fbo-clear-formats, but this will
be fixed with the use of real hardware clears and doesn't matter in
practice as no real application tries to scissor clear a 2x2 pixel
render target.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
Virtual address space put the userspace in charge of their GPU
address space. It's up to userspace to bind bo into the virtual
address space. Command stream can them be executed using the
IB_VM chunck.
This patch add support for this configuration. It doesn't remove
the 64K ib size limit thought this limit can be extanded up to
1M for IB_VM chunk.
v2: fix rendering
v3: fix rendering when using index buffer
v4: make vm conditional on kernel support add basic va management
v5: catch the case when we already have va for a bo
v6: agd5f: update on top of ioctl changes
v7: agd5f: further ioctl updates
v8: indentation cleanup + fix non cayman
v9: rebase against lastest mesa + improvement from Marek & Michel
v10: fix cut/paste bug
v11: don't rely on updated radeon_drm.h
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the comments precise. Explain why each branch is needed and correct.
Document the potential pitfall in the true-branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When using Mesa with a GLES API, calling _mesa_FramebufferRenderbuffer
with GL_DRAW_FRAMEBUFFER will report a 'user error' because
get_framebuffer_target validates that this enum from the framebuffer
blit extension is only used on GL. To work around it this patch makes
it use the GL_FRAMEBUFFER enum instead in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43418
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The gl_renderbuffer::Format field wasn't always set properly. This
didn't matter much in the past but with the recent swrast/renderbuffer
mapping changes, core Mesa will be directly touching OSMesa colorbuffers
so using the right MESA_FORMAT_x value is important.
Unfortunately, there aren't MESA_FORMATs for all the possible OSmesa
format/type combinations, such as GL_FLOAT / OSMESA_ARGB. If anyone
runs into these we can add new Mesa formats.
v2: add warnings for unsupported formats, fix ARGB_REV mix-up.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We always access pull constant buffers using the message types "OWord
Block Read" or "OWord Dual Block Read". According to the Sandy Bridge
PRM, Vol 4 Part 1, pages 214 and 218, when using these messages:
"the surface pitch is ignored, the surface is treated as a
1-dimensional surface. An element size (pitch) of 16 bytes is
used to determine the size of the buffer for out-of-bounds
checking if using the surface state model."
Previously we were setting the pitch for pull constant buffers to the
size of the whole constant buffer--this made no sense and would have
led to incorrect behavior if it were not for the fact that the pitch
is ignored.
For clarity, this patch sets the pitch for pull constant buffers to 16
bytes, consistent with the hardware's behavior.
v2: Clarify the meaning of the ignored values by writing them as (16 - 1).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9bdc44a528 (i965: Replace struct
with bit shifting for WM pull constant surfaces) accidentally
introduced off-by-one errors into the calculation of the surface
width, height, and depth. This patch restores the correct
computation.
The reason this wasn't noticed by Piglit tests is that the size of our
constant surfaces is always less than 2^20, therefore the off-by-one
error was causing the "depth" field of the surface to be set to all
1's. The hardware interpreted this as an extremely large surface, so
overflow checking was effectively disabled.
No Piglit regressions on Sandy Bridge.
NOTE: This is a candidate for the 7.11 and 8.0 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flat SHADE_MODEL still overrides any non-flat interpolation
qualifier, but pulling that state out of the rasterizer cso
isn't really worth the effort, is it ?
NOTE: This is a candidate for the 8.0 branch.
This fixes accum buffer operations. The accumulation buffer is the
only malloc-based renderbuffer for the intel drivers.
v2: apply x/y offset to returned pointer
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit EXT_framebuffer_multisample/negative-copypixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/negative-copyteximage.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample-negative-readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/renderbuffer-samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Previously, we were saying that everything from the starting tile to
region width+height was part of the limits of our depthbuffer, even if
the tile was near the bottom of the depthbuffer. This mean that our
range was not clipping to buffer buonds if the start tile was anything
but the start of the buffer.
In bebc91f0f3, this was changed to
saying that we're just rendering to a region of the size of the
renderbuffer. This is great -- we get a range that should actually
match what we want. However, the hardware's range checking occurs
after the X/Y offset addition, so we were clipping out rendering to
small depth mip levels when an X/Y offset was present. Just add
tile_x/y to the width in that case -- the WM won't produce negative
x/y values pre-offset, so we just need to get the left/bottom sides of
the region to cover our buffer.
Fixes the following Piglit regressions on gen7:
spec/ARB_depth_buffer_float/fbo-clear-formats
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 8.0 branch.
The array holds GLuint values so remove the float cast.
Note, however, that to compute the average of four GLuints we really
want to do (a+b+c+d)/4 but that could overflow. This change doesn't
address that for now.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In the first case, the newImage[] array contains GLuint values.
In the second case, the parameter type is GLuint, but the maxDepth
value is never used in this case (GL_FLOAT_32_UNSIGNED_INT_24_8_REV).
Pass ~OU just to be safe.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We include both imports.h and u_math.h in the state tracker. This
leads to multiple, conflicting definitions of ffs() with MSVC.
Use FFS_DEFINED to skip the ffs() in u_math.h.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
include mesa headers before gallium headers to avoid problem with
ffs() being defined in u_math.h and then again in imports.h
The next commit will add some #ifdefs to prevent multiple definitions
of ffs().
Call ffs() and ffsll() everywhere. Define our own ffs(), ffsll()
functions when the platform doesn't have them.
v2: remove #ifdef _WIN32, __IBMC__, __IBMCPP_ tests inside ffs()
implementation. The #else clause was recursive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Alexander von Gluck <kallisti5@unixzen.com>
Introduce vbo_get_minmax_indices() function to handle the min/max index
computation for nr_prims(>= 1). The old code just compute the first
prim's min/max index; this would results an error rendering if user
called functions like glMultiDrawElements(). This patch servers as
fixing this issue.
As when nr_prims = 1, we can pass 1 to paramter nr_prims, thus I made
vbo_get_minmax_index() static.
v2: per Roland's suggestion, put the indices address compuation into
vbo_get_minmax_index() instead.
Also do comination if possible to reduce map/unmap count
v3: per Brian's suggestion, use a pointer for start_prim to avoid
structure copy per loop.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
# Work around aliasing bugs - developers should comment this out
CFLAGS += -fno-strict-aliasing
CXXFLAGS += -fno-strict-aliasing
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.